Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We surveyed 10,208 people from more than 15 countries on their mobile app usage behavior. The countries include USA, China, Japan, Germany, France, Brazil, UK, Italy, Russia, India, Canada, Spain, Australia, Mexico, and South Korea. We asked respondents about: (1) their mobile app user behavior in terms of mobile app usage, including the app stores they use, what triggers them to look for apps, why they download apps, why they abandon apps, and the types of apps they download. (2) their demographics including gender, age, marital status, nationality, country of residence, first language, ethnicity, education level, occupation, and household income (3) their personality using the Big-Five personality traits This dataset contains the results of the survey.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.
User & Session Metadata
User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details
Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale
Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties
Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions
Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:
Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.
Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).
Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.
Facebook
TwitterData-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.3k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 66k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query-by-example search over UIs.
Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. Like its predecessor ERICA, Rico’s app mining infrastructure requires no access to — or modification of — an app’s source code. Apps are downloaded from the Google Play Store and served to crowd workers through a web interface. When crowd workers use an app, the system records a user interaction trace that captures the UIs visited and the interactions performed on them. Then, an automated agent replays the trace to warm up a new copy of the app and continues the exploration programmatically, leveraging a content-agnostic similarity heuristic to efficiently discover new UI states. By combining crowdsourcing and automation, Rico can achieve higher coverage over an app’s UI states than either crawling strategy alone. In total, 13 workers recruited on UpWork spent 2,450 hours using apps on the platform over five months, producing 10,811 user interaction traces. After collecting a user trace for an app, we ran the automated crawler on the app for one hour.
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN https://interactionmining.org/rico
The Rico dataset is large enough to support deep learning applications. We trained an autoencoder to learn an embedding for UI layouts, and used it to annotate each UI with a 64-dimensional vector representation encoding visual layout. This vector representation can be used to compute structurally — and often semantically — similar UIs, supporting example-based search over the dataset. To create training inputs for the autoencoder that embed layout information, we constructed a new image for each UI capturing the bounding box regions of all leaf elements in its view hierarchy, differentiating between text and non-text elements. Rico’s view hierarchies obviate the need for noisy image processing or OCR techniques to create these inputs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises user feedback data collected from 15 globally acclaimed mobile applications, spanning diverse categories. The included applications are among the most downloaded worldwide, providing a rich and varied source for analysis. The dataset is particularly suitable for Natural Language Processing (NLP) applications, such as text classification and topic modeling.
This dataset is open access for scientific research and non-commercial purposes. Users are required to acknowledge the authors' work and, in the case of scientific publication, cite the most appropriate reference:
1.Paper
M. H. Asnawi, A. A. Pravitasari, T. Herawan, and T. Hendrawati, "The Combination of Contextualized Topic Model and MPNet for User Feedback Topic Modeling," in IEEE Access, vol. 11, pp. 130272-130286, 2023, doi: https://doi.org/10.1109/ACCESS.2023.3332644
2.Dataset
Asnawi, M. H., Pravitasari, A. A., Herawan, T., & hendrawati, T. (2023). User Feedback Dataset from the Top 15 Downloaded Mobile Applications [Data set]. In The Combination of Contextualized Topic Model and MPNet for User Feedback Topic Modeling (1.0.0, Vol. 11, pp. 130272–130286). Zenodo. https://doi.org/10.5281/zenodo.10204232
Researchers and analysts are encouraged to explore this dataset for insights into user sentiments, preferences, and trends across these top mobile applications. If you have any questions or need further information, feel free to contact the dataset authors.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
During the study period
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🚀 MobileViews: A Large-Scale Mobile GUI Dataset
MobileViews is a large-scale dataset designed to support research on mobile agents and mobile user interface (UI) analysis. The first release, MobileViews-600K, includes over 600,000 mobile UI screenshot-view hierarchy (VH) pairs collected from over 20,000 apps on the Google Play Store. This dataset is based on the DroidBot, which we have optimized for large-scale data collection, capturing more comprehensive interaction details while… See the full description on the dataset page: https://huggingface.co/datasets/mllmTeam/MobileViews.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Looking for a Google Play apps dataset to analyze mobile app trends? The Google Play Store Apps Dataset delivers ~10,000 app records from the Google Play Store, including key app metadata like app name, category, rating, installs, price, developer details, and more. This dataset is ideal for app market research, mobile analytics, app store optimization studies (ASO), data science projects, and trend analysis.
Collect structured data on apps across genres and niches, so you can build visualizations, train machine-learning models, analyze user engagement, or compare categories like games, productivity, health & fitness, and finance.
Rich App Metadata: Includes app_id, app_name, category, rating, review_count, price, installs, content_rating, genres, last_updated, current_version, android_version, developer_name, developer_email, <span style="font-size: 12pt; font-family: 'Roboto Mono',monospace; color: #188038; background-color: transparent; font-weight: 400; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space:
Facebook
TwitterAs COVID-19 continues to spread across the world, a growing number of malicious campaigns are exploiting the pandemic. It is reported that COVID-19 is being used in a variety of online malicious activities, including Email scam, ransomware and malicious domains. As the number of the afflicted cases continue to surge, malicious campaigns that use coronavirus as a lure are increasing. Malicious developers take advantage of this opportunity to lure mobile users to download and install malicious apps.
However, besides a few media reports, the coronavirus-themed mobile malware has not been well studied. Our community lacks of the comprehensive understanding of the landscape of the coronavirus-themed mobile malware, and no accessible dataset could be used by our researchers to boost COVID-19 related cybersecurity studies.
We make efforts to create a daily growing COVID-19 related mobile app dataset. By the time of mid-November, we have curated a dataset of 4,322 COVID-19 themed apps, and 611 of them are considered to be malicious. The number is growing daily and our dataset will update weekly. For more details, please visit https://covid19apps.github.io
This dataset includes the following files:
(1) covid19apps.xlsx
In this file, we list all the COVID-19 themed apps information, including apk file hashes, released date, package name, AV-Rank, etc.
(2)covid19apps.zip
We put the COVID-19 themed apps Apk samples in zip files . In order to reduce the size of a single file, we divide the sample into multiple zip files for storage. And the APK file name after the file SHA256.
If your papers or articles use our dataset, please use the following bibtex reference to cite our paper: https://arxiv.org/abs/2005.14619
(Accepted to Empirical Software Engineering)
@misc{wang2021virus, title={Beyond the Virus: A First Look at Coronavirus-themed Mobile Malware}, author={Liu Wang and Ren He and Haoyu Wang and Pengcheng Xia and Yuanchun Li and Lei Wu and Yajin Zhou and Xiapu Luo and Yulei Sui and Yao Guo and Guoai Xu}, year={2021}, eprint={2005.14619}, archivePrefix={arXiv}, primaryClass={cs.CR} }
Facebook
TwitterAs of March 2021, YouTube Kids and Facebook Messenger Kids were the mobile apps for children found to collect the largest amount of data from global iOS users. The apps collected a total of 15 data points from each of the examined data types,. Language learning app Lingokids and educational app ABCmouse followed with 10 data points. The type of data that the examined children's apps collected mostoften were contact information and diagnostics.
Children mobile privacy From online education to gaming and social media, children and young users are increasingly active in online environments via mobile devices. In 2021, playing online games and watching YouTube videos figured among the most popular mobile activities for kids worldwide, while less than five in 10 reported using their phones to complete assignments for school. As vulnerable users, children are entitled to institutional protection and lower interference from tech companies. However, mobile apps designed for children still collect data from their young users. As of the beginning of 2022, money management and gaming apps were the app categories found to track the largest number of data segments from children, with 10.1 and 9.3 data points tracked, respectively.
Child proof social media? While the impact of social media on younger users’ development is yet to be fully understood, parents and educators were quick to realize that social media expands the range of dangers children can encounter while being online. In 2021, children in the United States and in the United Kingdom spent an average of 98 minutes per day on TikTok, as well as 83 minutes daily on Snapchat. In the U.S., both Snapchat and TikTok agreed to respect the age limit restrictions set by the Children's Online Privacy Protection Act (COPPA), and while Snapchat discontinued its children-specific Snapkidz app in 2016, TikTok relies on its TikTok Younger Users platform for users younger than 13. Despite the majority of social media services requiring users to be at least 13 years old, a survey conducted in 2021 in the United Kingdom has found that 60 percent of all surveyed kids aged between eight and 11 had their own social media profile.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.
Dataset Specifications:
Last crawled: (This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)Richness of Detail (11 Comprehensive Fields):
Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:
Review Content:
review: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.title: The title given to the review by the user, often summarizing their main point.isEdited: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.Reviewer & Rating Information:
username: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).rating: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.App & Origin Context:
app_name: The name of the application being reviewed.app_id: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.country: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.Metadata & Timestamps:
_id: A unique identifier for the specific review record in the dataset.crawled_at: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).date: The original date the review was posted by the user on the App Store.Expanded Use Cases & Analytical Applications:
This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:
Product Development & Improvement:
review text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.review text to inform future product roadmap decisions and develop features users actively desire.review field.rating and sentiment after new app updates to assess the effectiveness of bug fixes or new features.Market Research & Competitive Intelligence:
Marketing & App Store Optimization (ASO):
review and title fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.rating trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.Academic & Data Science Research:
review and title fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.rating distribution, isEdited status, and date to understand user engagement and feedback cycles.country-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of apps needed permissions during installation and run-time. We collect apps from three different sources google play, third-party apps and malware dataset. This file contains more than 5,00,000 Android apps. features extracted at the time of installation and execution. One file contains the name of the features and others contain .apk file corresponding to it extracted permissions and API calls. Benign apps are collected from Google's play store, hiapk, app china, Android, mumayi , gfan slideme, and pandaapp. These .apk files collected from the last three years continuously and contain 81 distinct malware families.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes only publicly available metadata scraped from the Google Play Store. No personal user data or proprietary binaries are included. All information is accessible directly through the public Play Store interface. Data included: "App ID", "App Name", "Category", "Description", "Updated on", "Version", "Downloads", "Released on", "Release notes", "Similar AppIDs", and "Image links". Data collected in August 2025.
Facebook
TwitterAs of May 2023, the mobile app version of popular ********************************* used ** of the data points they collected to track their iOS users, as well as collecting ** data points connected to the user's identity. Facebook, which was identified as the most data-hungry app among all the mobile social media, used ***** of its ** collected data points to track users. Dating app ****** collected ** data points collected to the users' identity, as well as **** data points to track users activity.
Facebook
TwitterAs of May 2023, product interaction data were the most commonly collected data points, with 94 over the 100 analyzed apps reporting to collect such data. User ID and crash data were collected by by 93 and 92 apps over 100, respectively. Over the 10 leading shopping apps hosted on the Apple App Store, the totality collected precise location, physical address, and payment info.
Facebook
Twitterhttps://www.paradoxintelligence.com/termshttps://www.paradoxintelligence.com/terms
Real-time mobile app analytics and user engagement metrics across iOS and Android platforms for institutional investment research and market intelligence.
Facebook
TwitterThis dataset contains app descriptions from the Google Play Store, annotated with labels identifying functional and non-functional features. Functional features describe the core functionalities and capabilities of the apps, such as specific tasks they perform or services they provide. Non-functional features refer to attributes related to the performance, usability, reliability, and other quality aspects of the apps. This labeled datase This labeled dataset can be used for various tasks such as natural language processing, machine learning, and feature extraction to enhance app analysis and categorization.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Description: Frappe Processed Dataset The Frappe dataset has been processed to refine the quality of user-item interactions by removing entries where either users or items had fewer than 5 interactions. This pruning resulted in a significant reduction in the dataset size:
Number of Users: 651 (a reduction of 31.97% from the original dataset) Number of Items: 1127 (a reduction of 72.39%) Total Number of Interactions: 84,373 (a reduction of 12.30%)
Columns Overview: The dataset… See the full description on the dataset page: https://huggingface.co/datasets/abadesalex/Frappe-mobile-app-usage.
Facebook
TwitterAs of March 2021, Waze was the mobile GPN navigation app found to collect the largest amount of data from global iOS users, with 21 data points collected across all examined segments. Maps.me collected a total of 20 data points from its users, including five data points on contact information. Hiking and trail GPS map Gaia followed, with 13 data points, respectively.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Object Detection Mobile App is a dataset for object detection tasks - it contains Objects annotations for 2,255 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThe graph shows a comparison for app downloads worldwide from 2018 to 2024, using data from Sensor Tower and data.ai. Global app downloads have plateaued in recent years, even declining, after seeing strong growth during the COVID-19 pandemic. For 2024, *** billion unique downloads per user account were recorded. Why the difference? Source methodology explains the gap The discrepancy arises from significant differences in the methodology used by the sources to aggregate and generate the data. Sensor Tower reports only unique downloads per user account, excluding app updates, re-downloads, and installations on multiple devices by the same user. In contrast, data.ai includes these additional activities as well as downloads from third-party Android stores and a broader geographic scope, resulting in substantially higher total counts. As a result, Sensor Tower's numbers better reflect new user acquisition, while data.ai's encompass all market activity and total engagement. Despite stagnating downloads user spending is growing While the number of downloads is leveling off, consumer spending on in-app purchases and related revenue has grown in 2024 to *** billion U.S. dollars, up from around *** billion U.S. dollars in 2023. While gaming remains the highest grossing app category overall, other categories drove the growth. The entertainment, photo & video, productivity, and social networking categories each grew by at least *** billion U.S. dollars in revenue in 2024 compared to the previous year.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We surveyed 10,208 people from more than 15 countries on their mobile app usage behavior. The countries include USA, China, Japan, Germany, France, Brazil, UK, Italy, Russia, India, Canada, Spain, Australia, Mexico, and South Korea. We asked respondents about: (1) their mobile app user behavior in terms of mobile app usage, including the app stores they use, what triggers them to look for apps, why they download apps, why they abandon apps, and the types of apps they download. (2) their demographics including gender, age, marital status, nationality, country of residence, first language, ethnicity, education level, occupation, and household income (3) their personality using the Big-Five personality traits This dataset contains the results of the survey.