Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.
User & Session Metadata
User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details
Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale
Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties
Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions
Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:
Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.
Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).
Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
From Harvard Dataverse
Description: We surveyed 10,208 people from more than 15 countries on their mobile app usage behavior. The countries include USA, China, Japan, Germany, France, Brazil, UK, Italy, Russia, India, Canada, Spain, Australia, Mexico, and South Korea. We asked respondents about: (1) their mobile app user behavior in terms of mobile app usage, including the app stores they use, what triggers them to look for apps, why they download apps, why they abandon apps, and the types of apps they download. (2) their demographics including gender, age, marital status, nationality, country of residence, first language, ethnicity, education level, occupation, and household income (3) their personality using the Big-Five personality traits This dataset contains the results of the survey.
Author: Lim, Soo Ling, 2014, "Worldwide Mobile App User Behavior Dataset", https://doi.org/10.7910/DVN/27459, Harvard Dataverse, V1
Author filliation: University College London
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive insights into mobile app usage patterns across different categories, including education, social media, productivity, entertainment, health, news, and shopping applications. It contains screen time data for 500 users with demographic information such as age and gender, making it valuable for analyzing digital behavior patterns and productivity correlations.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🚀 MobileViews: A Large-Scale Mobile GUI Dataset
MobileViews is a large-scale dataset designed to support research on mobile agents and mobile user interface (UI) analysis. The first release, MobileViews-600K, includes over 600,000 mobile UI screenshot-view hierarchy (VH) pairs collected from over 20,000 apps on the Google Play Store. This dataset is based on the DroidBot, which we have optimized for large-scale data collection, capturing more comprehensive interaction details while… See the full description on the dataset page: https://huggingface.co/datasets/mllmTeam/MobileViews.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset simulates anonymized mobile screen time and app usage data collected from Android/iOS users over a 3-month period (Jan–April 2024). It captures daily usage trends across various app categories including:
Productivity: Google Docs, Notion, Slack
Entertainment: YouTube, Netflix, TikTok
Social Media: Instagram, WhatsApp, Facebook
Utilities: Chrome, Gmail, Maps
For YouTube, additional engagement statistics such as views, likes, and comments are included to analyze video popularity and content consumption behavior.
The dataset enables exploration of:
Productivity vs. entertainment screen time patterns
Daily usage fluctuations
App-specific user engagement
Correlation between time spent and user interactions
YouTube content virality metrics
This is a great resource for:
EDA projects
Behavioral clustering
Dashboard development
Time series and anomaly detection
Building recommendation or focus-assistive apps
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
During the study period
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
| Column Name | Description |
|---|---|
| App | The name of the app as listed on the Google Play Store. |
| Category | The category to which the app belongs (e.g., ART_AND_DESIGN, GAME). |
| Rating | The user rating of the app on a scale from 1 to 5. |
| Reviews | The number of user reviews for the app. |
| Size | The size of the app in megabytes (MB) or kilobytes (KB). |
| Installs | The number of installs/downloads of the app (e.g., 10,000+). |
| Type | Indicates whether the app is free or paid. |
| Price | The price of the app in USD, if it is a paid app. |
| Content Rating | The target audience for the app (e.g., Everyone, Teen, Mature 17+). |
| Genres | The genres associated with the app (e.g., Art & Design, Creativity). |
| Last Updated | The date when the app was last updated. |
| Current Ver | The current version of the app. |
| Android Ver | The minimum Android version required to run the app. |
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.
Dataset Specifications:
Last crawled: (This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)Richness of Detail (11 Comprehensive Fields):
Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:
Review Content:
review: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.title: The title given to the review by the user, often summarizing their main point.isEdited: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.Reviewer & Rating Information:
username: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).rating: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.App & Origin Context:
app_name: The name of the application being reviewed.app_id: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.country: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.Metadata & Timestamps:
_id: A unique identifier for the specific review record in the dataset.crawled_at: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).date: The original date the review was posted by the user on the App Store.Expanded Use Cases & Analytical Applications:
This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:
Product Development & Improvement:
review text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.review text to inform future product roadmap decisions and develop features users actively desire.review field.rating and sentiment after new app updates to assess the effectiveness of bug fixes or new features.Market Research & Competitive Intelligence:
Marketing & App Store Optimization (ASO):
review and title fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.rating trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.Academic & Data Science Research:
review and title fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.rating distribution, isEdited status, and date to understand user engagement and feedback cycles.country-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of apps needed permissions during installation and run-time. We collect apps from three different sources google play, third-party apps and malware dataset. This file contains more than 5,00,000 Android apps. features extracted at the time of installation and execution. One file contains the name of the features and others contain .apk file corresponding to it extracted permissions and API calls. Benign apps are collected from Google's play store, hiapk, app china, Android, mumayi , gfan slideme, and pandaapp. These .apk files collected from the last three years continuously and contain 81 distinct malware families.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has extracted features from Hybrid Apps available for deployment on the Android platform until recently. The data for this dataset has been culled out from various sources, including existing similar datasets and Google Play Store or its mirrors. The dataset is labelled to differentiate malicious and benign Hybrid Apps. Thus, it may conveniently be used for supervised learning. Nonetheless, the dataset has adequate attributes to support any unsupervised learning task as well. The dataset comprises 78,767 samples.
Facebook
Twitterhttps://www.paradoxintelligence.com/termshttps://www.paradoxintelligence.com/terms
App usage patterns and mobile behavior analytics providing digital ecosystem insights for institutional investors.
Facebook
TwitterAs of May 2023, the mobile app version of popular ********************************* used ** of the data points they collected to track their iOS users, as well as collecting ** data points connected to the user's identity. Facebook, which was identified as the most data-hungry app among all the mobile social media, used ***** of its ** collected data points to track users. Dating app ****** collected ** data points collected to the users' identity, as well as **** data points to track users activity.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Mobile Device Usage and User Behavior Dataset contains 700 structured samples including app usage metrics, screen time patterns, battery consumption data, session duration, and user behavior classification labels. Designed for AI/ML model training, behavioral analytics, predictive modeling, and mobile performance optimization research.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises user feedback data collected from 15 globally acclaimed mobile applications, spanning diverse categories. The included applications are among the most downloaded worldwide, providing a rich and varied source for analysis. The dataset is particularly suitable for Natural Language Processing (NLP) applications, such as text classification and topic modeling.
This dataset is open access for scientific research and non-commercial purposes. Users are required to acknowledge the authors' work and, in the case of scientific publication, cite the most appropriate reference:
1.Paper
M. H. Asnawi, A. A. Pravitasari, T. Herawan, and T. Hendrawati, "The Combination of Contextualized Topic Model and MPNet for User Feedback Topic Modeling," in IEEE Access, vol. 11, pp. 130272-130286, 2023, doi: https://doi.org/10.1109/ACCESS.2023.3332644
2.Dataset
Asnawi, M. H., Pravitasari, A. A., Herawan, T., & hendrawati, T. (2023). User Feedback Dataset from the Top 15 Downloaded Mobile Applications [Data set]. In The Combination of Contextualized Topic Model and MPNet for User Feedback Topic Modeling (1.0.0, Vol. 11, pp. 130272–130286). Zenodo. https://doi.org/10.5281/zenodo.10204232
Researchers and analysts are encouraged to explore this dataset for insights into user sentiments, preferences, and trends across these top mobile applications. If you have any questions or need further information, feel free to contact the dataset authors.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository is part of the ITC-NetMingledApp dataset, which includes network traffic data from 36 Android applications, with each capture featuring concurrent traffic from multiple applications and smartphones. This repository contains data related to the Iran-Qom scenario. Each capture is stored in a compressed file containing the relevant PCAP files of the associated applications. The PCAP files are named according to a convention: {TimeStamp}_{Application Name}{Download-Upload Speed}.pcap
Facebook
TwitterThese data have been collected and supplied by Huq Ltd. and comprise of records for the period July 2016 to October 2020. The data contain aggregated geolocated activity counts derived from mobile phone app use across Great Britain.
Mobile phone applications seek user’s consent for recording and storing the mobile device’s location when the app is in use. Activity counts are derived from these locations as the sum of distinct devices per grid cell per day. These data can be used as proxy for estimating activity levels and footfall across the UK.
These aggregate data were created from record level data which comprised individual phone IDs, and multiple entries for each mobile device if it is used multiple times for one app or the user accesses multiple apps. Thus, the following data cleaning and aggregation process has been used to create the activity counts:
Cleaning: Daily records comprise unique device ID, time-stamp and location of each entry collected by any app. The time-stamp is reformatted as a single daily date attribute.
Spatial linkage to OSGB grid: After turning the daily data-frames into spatial objects, the files are joined to the 1km x 1km OSGB grid, and each impression is attributed a grid cell ID corresponding to its latitude and longitude.
Creation of activity counts: Activity counts are created following the previous steps by counting the number of unique device IDs per grid cell per date. This removes multiple appearances of the same device (one device may collect multiple impressions through different apps or due to frequent usage). The final activity count corresponds to the number of unique devices within a 1km square for that day.
Output: The output comprises cleaned aggregation counts for each grid cell and day
N.B. More detail on how the data was collected and coverage is available if requesting for this detail in your initial application purpose, or if contacting us by email once you have made your initial application and received the form. Applicants would need to sign a non-disclosure agreement before accessing this detail, and such as request will significantly increase the time for data delivery. You can, of course, make a full application for the data without first receiving this collection/ coverage metadata.
These data are provided at 1km x 1km OSGB Grid cells.
Activity counts of 1-10 devices are masked and replaced by “*” in the database, as low counts present potentially identifiable information.
For detailed description of the columns contained within the data, see the Variable Dictionary; and for an overview of the characteristics of the data, see the Data Summary. These files can be downloaded from the bottom of this page.
Excellent quality and coverage for major towns and cities. The data may be less complete for smaller settlements or more rural areas. Data are subject to suppression of potentially disclosive low counts as detailed above. Huq collects data from a varying mix of apps, the identities of which are commercially sensitive. Apps may be added to or deleted from the secure and summary data products over time. This, along with increasing national coverage and mobile phone uptake, results in general increases in apparent activity over the period covered by the data.
The dataset would benefit from comparison with population estimates (e.g. census data) to investigate coverage issues. 2016 data have the highest percentage of suppressed counts, and data suppression generally decreases over time, particularly in metropolitan (Met) areas. Data suppression levels in metropolitan areas generally fall below 50% by 2020.
Facebook
TwitterDataset Card for Dataset Name
Dataset Summary
MobileRec is a large-scale app recommendation dataset. There are 19.3 million user\item interactions. This is a 5-core dataset. User\item interactions are sorted in ascending chronological order. There are 0.7 million users who have had at least five distinct interactions. There are 10173 apps in total.
Supported Tasks and Leaderboards
Sequential Recommendation
Languages
English
How to use the… See the full description on the dataset page: https://huggingface.co/datasets/recmeapp/mobilerec.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Description: Frappe Processed Dataset The Frappe dataset has been processed to refine the quality of user-item interactions by removing entries where either users or items had fewer than 5 interactions. This pruning resulted in a significant reduction in the dataset size:
Number of Users: 651 (a reduction of 31.97% from the original dataset) Number of Items: 1127 (a reduction of 72.39%) Total Number of Interactions: 84,373 (a reduction of 12.30%)
Columns Overview: The dataset… See the full description on the dataset page: https://huggingface.co/datasets/abadesalex/Frappe-mobile-app-usage.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This comprehensive iOS application reviews dataset contains thousands of authentic user reviews from the Apple App Store in English. The dataset provides valuable insights for app developers, marketers, and researchers studying mobile application performance and user sentiment.
Key Features:
Applications: Perfect for sentiment analysis, app store optimization, mobile app development research, user experience studies, and competitive analysis. This dataset enables businesses to understand user preferences, identify app improvement opportunities, and develop better mobile applications.
Data Quality: All reviews are genuine user feedback collected from the official Apple App Store, ensuring authenticity and reliability for research and business intelligence purposes. The dataset covers various app categories including fitness, shopping, education, entertainment, and productivity applications.
Facebook
TwitterAs of May 2023, the mobile app of shopping and marketplace platform Etsy used approximately half of its collected data points to track users. In comparison, health app Noom used only *** of its collected user data point for tracking purposes.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.
User & Session Metadata
User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details
Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale
Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties
Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions
Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:
Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.
Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).
Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.