Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.
User & Session Metadata
User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details
Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale
Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties
Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions
Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:
Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.
Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).
Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This DataSet Contains Detailed Insights into Mobile App Usage Patterns, including ScreenTime, notifications received, and app openings. The data spans multiple days in August and some popular apps, offering a granular view of digital behavior.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We surveyed 10,208 people from more than 15 countries on their mobile app usage behavior. The countries include USA, China, Japan, Germany, France, Brazil, UK, Italy, Russia, India, Canada, Spain, Australia, Mexico, and South Korea. We asked respondents about: (1) their mobile app user behavior in terms of mobile app usage, including the app stores they use, what triggers them to look for apps, why they download apps, why they abandon apps, and the types of apps they download. (2) their demographics including gender, age, marital status, nationality, country of residence, first language, ethnicity, education level, occupation, and household income (3) their personality using the Big-Five personality traits This dataset contains the results of the survey.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset simulates anonymized mobile screen time and app usage data collected from Android/iOS users over a 3-month period (Jan–April 2024). It captures daily usage trends across various app categories including:
Productivity: Google Docs, Notion, Slack
Entertainment: YouTube, Netflix, TikTok
Social Media: Instagram, WhatsApp, Facebook
Utilities: Chrome, Gmail, Maps
For YouTube, additional engagement statistics such as views, likes, and comments are included to analyze video popularity and content consumption behavior.
The dataset enables exploration of:
Productivity vs. entertainment screen time patterns
Daily usage fluctuations
App-specific user engagement
Correlation between time spent and user interactions
YouTube content virality metrics
This is a great resource for:
EDA projects
Behavioral clustering
Dashboard development
Time series and anomaly detection
Building recommendation or focus-assistive apps
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
During the study period
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.
Dataset Specifications:
Last crawled: (This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)Richness of Detail (11 Comprehensive Fields):
Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:
Review Content:
review: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.title: The title given to the review by the user, often summarizing their main point.isEdited: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.Reviewer & Rating Information:
username: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).rating: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.App & Origin Context:
app_name: The name of the application being reviewed.app_id: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.country: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.Metadata & Timestamps:
_id: A unique identifier for the specific review record in the dataset.crawled_at: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).date: The original date the review was posted by the user on the App Store.Expanded Use Cases & Analytical Applications:
This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:
Product Development & Improvement:
review text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.review text to inform future product roadmap decisions and develop features users actively desire.review field.rating and sentiment after new app updates to assess the effectiveness of bug fixes or new features.Market Research & Competitive Intelligence:
Marketing & App Store Optimization (ASO):
review and title fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.rating trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.Academic & Data Science Research:
review and title fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.rating distribution, isEdited status, and date to understand user engagement and feedback cycles.country-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We built a crawler to collect data from the Google Play store including the application's metadata and APK files. The manifest files were extracted from the APK files and then processed to extract the features. The data set is composed of 870,515 records/apps, and for each app we produced 48 features. The data set was used to built and test two bootstrap aggregating of multiple XGBoost machine learning classifiers. The dataset were collected between April 2017 and November 2018. We then checked the status of these applications on three different occasions; December 2018, February 2019, and May-June 2019.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
| Column Name | Description |
|---|---|
| App | The name of the app as listed on the Google Play Store. |
| Category | The category to which the app belongs (e.g., ART_AND_DESIGN, GAME). |
| Rating | The user rating of the app on a scale from 1 to 5. |
| Reviews | The number of user reviews for the app. |
| Size | The size of the app in megabytes (MB) or kilobytes (KB). |
| Installs | The number of installs/downloads of the app (e.g., 10,000+). |
| Type | Indicates whether the app is free or paid. |
| Price | The price of the app in USD, if it is a paid app. |
| Content Rating | The target audience for the app (e.g., Everyone, Teen, Mature 17+). |
| Genres | The genres associated with the app (e.g., Art & Design, Creativity). |
| Last Updated | The date when the app was last updated. |
| Current Ver | The current version of the app. |
| Android Ver | The minimum Android version required to run the app. |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of apps needed permissions during installation and run-time. We collect apps from three different sources google play, third-party apps and malware dataset. This file contains more than 5,00,000 Android apps. features extracted at the time of installation and execution. One file contains the name of the features and others contain .apk file corresponding to it extracted permissions and API calls. Benign apps are collected from Google's play store, hiapk, app china, Android, mumayi , gfan slideme, and pandaapp. These .apk files collected from the last three years continuously and contain 81 distinct malware families.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Looking for a Google Play apps dataset to analyze mobile app trends? The Google Play Store Apps Dataset delivers ~10,000 app records from the Google Play Store, including key app metadata like app name, category, rating, installs, price, developer details, and more. This dataset is ideal for app market research, mobile analytics, app store optimization studies (ASO), data science projects, and trend analysis.
Collect structured data on apps across genres and niches, so you can build visualizations, train machine-learning models, analyze user engagement, or compare categories like games, productivity, health & fitness, and finance.
Rich App Metadata: Includes app_id, app_name, category, rating, review_count, price, installs, content_rating, genres, last_updated, current_version, android_version, developer_name, developer_email, <span style="font-size: 12pt; font-family: 'Roboto Mono',monospace; color: #188038; background-color: transparent; font-weight: 400; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space:
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Mobile Device Usage and User Behavior Dataset contains 700 structured samples including app usage metrics, screen time patterns, battery consumption data, session duration, and user behavior classification labels. Designed for AI/ML model training, behavioral analytics, predictive modeling, and mobile performance optimization research.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has extracted features from Hybrid Apps available for deployment on the Android platform until recently. The data for this dataset has been culled out from various sources, including existing similar datasets and Google Play Store or its mirrors. The dataset is labelled to differentiate malicious and benign Hybrid Apps. Thus, it may conveniently be used for supervised learning. Nonetheless, the dataset has adequate attributes to support any unsupervised learning task as well. The dataset comprises 78,767 samples.
Facebook
TwitterAs of March 2021, social media apps and food delivery apps were reported to collect the largest amount of data from global iOS users, with 20 data points tracked respectively. Shopping apps ranked second, with 18 data points tracked from global iOS users, while browser apps and image editing apps had the lowest amount of average data points tracked.
Facebook
TwitterThese data have been collected and supplied by Huq Ltd. and comprise of records for the period July 2016 to October 2020. The data contain aggregated geolocated activity counts derived from mobile phone app use across Great Britain.
Mobile phone applications seek user’s consent for recording and storing the mobile device’s location when the app is in use. Activity counts are derived from these locations as the sum of distinct devices per grid cell per day. These data can be used as proxy for estimating activity levels and footfall across the UK.
These aggregate data were created from record level data which comprised individual phone IDs, and multiple entries for each mobile device if it is used multiple times for one app or the user accesses multiple apps. Thus, the following data cleaning and aggregation process has been used to create the activity counts:
Cleaning: Daily records comprise unique device ID, time-stamp and location of each entry collected by any app. The time-stamp is reformatted as a single daily date attribute.
Spatial linkage to OSGB grid: After turning the daily data-frames into spatial objects, the files are joined to the 1km x 1km OSGB grid, and each impression is attributed a grid cell ID corresponding to its latitude and longitude.
Creation of activity counts: Activity counts are created following the previous steps by counting the number of unique device IDs per grid cell per date. This removes multiple appearances of the same device (one device may collect multiple impressions through different apps or due to frequent usage). The final activity count corresponds to the number of unique devices within a 1km square for that day.
Output: The output comprises cleaned aggregation counts for each grid cell and day
N.B. More detail on how the data was collected and coverage is available if requesting for this detail in your initial application purpose, or if contacting us by email once you have made your initial application and received the form. Applicants would need to sign a non-disclosure agreement before accessing this detail, and such as request will significantly increase the time for data delivery. You can, of course, make a full application for the data without first receiving this collection/ coverage metadata.
These data are provided at 1km x 1km OSGB Grid cells.
Activity counts of 1-10 devices are masked and replaced by “*” in the database, as low counts present potentially identifiable information.
For detailed description of the columns contained within the data, see the Variable Dictionary; and for an overview of the characteristics of the data, see the Data Summary. These files can be downloaded from the bottom of this page.
Excellent quality and coverage for major towns and cities. The data may be less complete for smaller settlements or more rural areas. Data are subject to suppression of potentially disclosive low counts as detailed above. Huq collects data from a varying mix of apps, the identities of which are commercially sensitive. Apps may be added to or deleted from the secure and summary data products over time. This, along with increasing national coverage and mobile phone uptake, results in general increases in apparent activity over the period covered by the data.
The dataset would benefit from comparison with population estimates (e.g. census data) to investigate coverage issues. 2016 data have the highest percentage of suppressed counts, and data suppression generally decreases over time, particularly in metropolitan (Met) areas. Data suppression levels in metropolitan areas generally fall below 50% by 2020.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises user feedback data collected from 15 globally acclaimed mobile applications, spanning diverse categories. The included applications are among the most downloaded worldwide, providing a rich and varied source for analysis. The dataset is particularly suitable for Natural Language Processing (NLP) applications, such as text classification and topic modeling.
This dataset is open access for scientific research and non-commercial purposes. Users are required to acknowledge the authors' work and, in the case of scientific publication, cite the most appropriate reference:
1.Paper
M. H. Asnawi, A. A. Pravitasari, T. Herawan, and T. Hendrawati, "The Combination of Contextualized Topic Model and MPNet for User Feedback Topic Modeling," in IEEE Access, vol. 11, pp. 130272-130286, 2023, doi: https://doi.org/10.1109/ACCESS.2023.3332644
2.Dataset
Asnawi, M. H., Pravitasari, A. A., Herawan, T., & hendrawati, T. (2023). User Feedback Dataset from the Top 15 Downloaded Mobile Applications [Data set]. In The Combination of Contextualized Topic Model and MPNet for User Feedback Topic Modeling (1.0.0, Vol. 11, pp. 130272–130286). Zenodo. https://doi.org/10.5281/zenodo.10204232
Researchers and analysts are encouraged to explore this dataset for insights into user sentiments, preferences, and trends across these top mobile applications. If you have any questions or need further information, feel free to contact the dataset authors.
Facebook
TwitterAs of May 2023, product interaction data were the most commonly collected data points, with 94 over the 100 analyzed apps reporting to collect such data. User ID and crash data were collected by by 93 and 92 apps over 100, respectively. Over the 10 leading shopping apps hosted on the Apple App Store, the totality collected precise location, physical address, and payment info.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The iOS App Store launched in 2008 with 500 apps. Today, there are over four million apps available across iOS and Android platforms, extending to a wide range of sub-genres and niches. These apps...
Facebook
Twitterhttps://www.paradoxintelligence.com/termshttps://www.paradoxintelligence.com/terms
App usage patterns and mobile behavior analytics providing digital ecosystem insights for institutional investors.
Facebook
TwitterAs of May 2023, the mobile app version of popular ********************************* used ** of the data points they collected to track their iOS users, as well as collecting ** data points connected to the user's identity. Facebook, which was identified as the most data-hungry app among all the mobile social media, used ***** of its ** collected data points to track users. Dating app ****** collected ** data points collected to the users' identity, as well as **** data points to track users activity.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset provides insights into mobile app infrastructure patterns from 1182 apps sampled from 69 Danish participants in the Datafied Living research project under PI Professor Stine Lomborg. It comprises several files, offering a multifaceted view of app ecosystems. This dataset is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme Datafied Living at The University of Copenhagen (Grant agreement ID: 947735).Read More: Datafied Living App List (iOS & Android):Two files lists all the apps used by the participants, providing a foundation for further analysis (749 iOS apps, 433 Android apps). SDK Detection Files (iOS & Android):Two files detail the Software Development Kits (SDKs) detected within the apps. One file covers iOS apps, and the other covers Android apps, allowing for investigation into the technologies powering user applications. Android SDK Owners File:This file identifies some of the owners of the Android SDKs used within the Android apps. iOS SDK Identifiers:This file identifies some of the SDKs found on iOS, with RegEx patters and information. iOS Protected Resource Identifiers:This file contains all iOS Protected Resources - also know as Application Permissions
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.
User & Session Metadata
User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details
Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale
Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties
Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions
Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:
Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.
Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).
Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.