Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
The ever-changing mobile landscape is a challenging space to navigate. . The percentage of mobile over desktop is only increasing. Android holds about 53.2% of the smartphone market, while iOS is 43%. To get more people to download your app, you need to make sure they can easily find your app. Mobile app analytics is a great way to understand the existing strategy to drive growth and retention of future user.
With million of apps around nowadays, the following data set has become very key to getting top trending apps in iOS app store. This data set contains more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study.
Interactive full Shiny app can be seen here( https://multiscal.shinyapps.io/appStore/)
Data collection date (from API); July 2017
Dimension of the data set; 7197 rows and 16 columns
"id" : App ID
"track_name": App Name
"size_bytes": Size (in Bytes)
"currency": Currency Type
"price": Price amount
"rating_count_tot": User Rating counts (for all version)
"rating_count_ver": User Rating counts (for current version)
"user_rating" : Average User Rating value (for all version)
"user_rating_ver": Average User Rating value (for current version)
"ver" : Latest version code
"cont_rating": Content Rating
"prime_genre": Primary Genre
"sup_devices.num": Number of supporting devices
"ipadSc_urls.num": Number of screenshots showed for display
"lang.num": Number of supported languages
"vpp_lic": Vpp Device Based Licensing Enabled
The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study.
Reference: R package
From github, with
devtools::install_github("ramamet/applestoreR")
Copyright (c) 2018 Ramanathan Perumal
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
From Harvard Dataverse
Description: We surveyed 10,208 people from more than 15 countries on their mobile app usage behavior. The countries include USA, China, Japan, Germany, France, Brazil, UK, Italy, Russia, India, Canada, Spain, Australia, Mexico, and South Korea. We asked respondents about: (1) their mobile app user behavior in terms of mobile app usage, including the app stores they use, what triggers them to look for apps, why they download apps, why they abandon apps, and the types of apps they download. (2) their demographics including gender, age, marital status, nationality, country of residence, first language, ethnicity, education level, occupation, and household income (3) their personality using the Big-Five personality traits This dataset contains the results of the survey.
Author: Lim, Soo Ling, 2014, "Worldwide Mobile App User Behavior Dataset", https://doi.org/10.7910/DVN/27459, Harvard Dataverse, V1
Author filliation: University College London
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.
User & Session Metadata
User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details
Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale
Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties
Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions
Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:
Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.
Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).
Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
During the study period
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset simulates anonymized mobile screen time and app usage data collected from Android/iOS users over a 3-month period (Jan–April 2024). It captures daily usage trends across various app categories including:
Productivity: Google Docs, Notion, Slack
Entertainment: YouTube, Netflix, TikTok
Social Media: Instagram, WhatsApp, Facebook
Utilities: Chrome, Gmail, Maps
For YouTube, additional engagement statistics such as views, likes, and comments are included to analyze video popularity and content consumption behavior.
The dataset enables exploration of:
Productivity vs. entertainment screen time patterns
Daily usage fluctuations
App-specific user engagement
Correlation between time spent and user interactions
YouTube content virality metrics
This is a great resource for:
EDA projects
Behavioral clustering
Dashboard development
Time series and anomaly detection
Building recommendation or focus-assistive apps
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🚀 MobileViews: A Large-Scale Mobile GUI Dataset
MobileViews is a large-scale dataset designed to support research on mobile agents and mobile user interface (UI) analysis. The first release, MobileViews-600K, includes over 600,000 mobile UI screenshot-view hierarchy (VH) pairs collected from over 20,000 apps on the Google Play Store. This dataset is based on the DroidBot, which we have optimized for large-scale data collection, capturing more comprehensive interaction details while… See the full description on the dataset page: https://huggingface.co/datasets/mllmTeam/MobileViews.
Facebook
TwitterAs COVID-19 continues to spread across the world, a growing number of malicious campaigns are exploiting the pandemic. It is reported that COVID-19 is being used in a variety of online malicious activities, including Email scam, ransomware and malicious domains. As the number of the afflicted cases continue to surge, malicious campaigns that use coronavirus as a lure are increasing. Malicious developers take advantage of this opportunity to lure mobile users to download and install malicious apps.
However, besides a few media reports, the coronavirus-themed mobile malware has not been well studied. Our community lacks of the comprehensive understanding of the landscape of the coronavirus-themed mobile malware, and no accessible dataset could be used by our researchers to boost COVID-19 related cybersecurity studies.
We make efforts to create a daily growing COVID-19 related mobile app dataset. By the time of mid-November, we have curated a dataset of 4,322 COVID-19 themed apps, and 611 of them are considered to be malicious. The number is growing daily and our dataset will update weekly. For more details, please visit https://covid19apps.github.io
This dataset includes the following files:
(1) covid19apps.xlsx
In this file, we list all the COVID-19 themed apps information, including apk file hashes, released date, package name, AV-Rank, etc.
(2)covid19apps.zip
We put the COVID-19 themed apps Apk samples in zip files . In order to reduce the size of a single file, we divide the sample into multiple zip files for storage. And the APK file name after the file SHA256.
If your papers or articles use our dataset, please use the following bibtex reference to cite our paper: https://arxiv.org/abs/2005.14619
(Accepted to Empirical Software Engineering)
@misc{wang2021virus, title={Beyond the Virus: A First Look at Coronavirus-themed Mobile Malware}, author={Liu Wang and Ren He and Haoyu Wang and Pengcheng Xia and Yuanchun Li and Lei Wu and Yajin Zhou and Xiapu Luo and Yulei Sui and Yao Guo and Guoai Xu}, year={2021}, eprint={2005.14619}, archivePrefix={arXiv}, primaryClass={cs.CR} }
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Looking for a Google Play apps dataset to analyze mobile app trends? The Google Play Store Apps Dataset delivers ~10,000 app records from the Google Play Store, including key app metadata like app name, category, rating, installs, price, developer details, and more. This dataset is ideal for app market research, mobile analytics, app store optimization studies (ASO), data science projects, and trend analysis.
Collect structured data on apps across genres and niches, so you can build visualizations, train machine-learning models, analyze user engagement, or compare categories like games, productivity, health & fitness, and finance.
Rich App Metadata: Includes app_id, app_name, category, rating, review_count, price, installs, content_rating, genres, last_updated, current_version, android_version, developer_name, developer_email, <span style="font-size: 12pt; font-family: 'Roboto Mono',monospace; color: #188038; background-color: transparent; font-weight: 400; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises user feedback data collected from 15 globally acclaimed mobile applications, spanning diverse categories. The included applications are among the most downloaded worldwide, providing a rich and varied source for analysis. The dataset is particularly suitable for Natural Language Processing (NLP) applications, such as text classification and topic modeling.
This dataset is open access for scientific research and non-commercial purposes. Users are required to acknowledge the authors' work and, in the case of scientific publication, cite the most appropriate reference:
1.Paper
M. H. Asnawi, A. A. Pravitasari, T. Herawan, and T. Hendrawati, "The Combination of Contextualized Topic Model and MPNet for User Feedback Topic Modeling," in IEEE Access, vol. 11, pp. 130272-130286, 2023, doi: https://doi.org/10.1109/ACCESS.2023.3332644
2.Dataset
Asnawi, M. H., Pravitasari, A. A., Herawan, T., & hendrawati, T. (2023). User Feedback Dataset from the Top 15 Downloaded Mobile Applications [Data set]. In The Combination of Contextualized Topic Model and MPNet for User Feedback Topic Modeling (1.0.0, Vol. 11, pp. 130272–130286). Zenodo. https://doi.org/10.5281/zenodo.10204232
Researchers and analysts are encouraged to explore this dataset for insights into user sentiments, preferences, and trends across these top mobile applications. If you have any questions or need further information, feel free to contact the dataset authors.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of apps needed permissions during installation and run-time. We collect apps from three different sources google play, third-party apps and malware dataset. This file contains more than 5,00,000 Android apps. features extracted at the time of installation and execution. One file contains the name of the features and others contain .apk file corresponding to it extracted permissions and API calls. Benign apps are collected from Google's play store, hiapk, app china, Android, mumayi , gfan slideme, and pandaapp. These .apk files collected from the last three years continuously and contain 81 distinct malware families.
Facebook
Twitterhttps://www.paradoxintelligence.com/termshttps://www.paradoxintelligence.com/terms
App usage patterns and mobile behavior analytics.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Description: Frappe Processed Dataset The Frappe dataset has been processed to refine the quality of user-item interactions by removing entries where either users or items had fewer than 5 interactions. This pruning resulted in a significant reduction in the dataset size:
Number of Users: 651 (a reduction of 31.97% from the original dataset) Number of Items: 1127 (a reduction of 72.39%) Total Number of Interactions: 84,373 (a reduction of 12.30%)
Columns Overview: The dataset… See the full description on the dataset page: https://huggingface.co/datasets/abadesalex/Frappe-mobile-app-usage.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Mobile Device Usage and User Behavior Dataset contains 700 structured samples including app usage metrics, screen time patterns, battery consumption data, session duration, and user behavior classification labels. Designed for AI/ML model training, behavioral analytics, predictive modeling, and mobile performance optimization research.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
This dataset encompasses a wide-ranging collection of Google Play applications, providing a holistic view of the diverse ecosystem within the platform. It includes information on various attributes such as the title, developer, monetization features, images, app descriptions, data safety measures, user ratings, number of reviews, star rating distributions, user feedback, recent updates, related applications by the same developer, content ratings, estimated downloads, and timestamps. By aggregating this data, the dataset offers researchers, developers, and analysts an extensive resource to explore and analyze trends, patterns, and dynamics within the Google Play Store. Researchers can utilize this dataset to conduct comprehensive studies on user behavior, market trends, and the impact of various factors on app success. Developers can leverage the insights derived from this dataset to inform their app development strategies, improve user engagement, and optimize monetization techniques. Analysts can employ the dataset to identify emerging trends, assess the performance of different categories of applications, and gain valuable insights into consumer preferences. Overall, this dataset serves as a valuable tool for understanding the broader landscape of the Google Play Store and unlocking actionable insights for various stakeholders in the mobile app industry.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has extracted features from Hybrid Apps available for deployment on the Android platform until recently. The data for this dataset has been culled out from various sources, including existing similar datasets and Google Play Store or its mirrors. The dataset is labelled to differentiate malicious and benign Hybrid Apps. Thus, it may conveniently be used for supervised learning. Nonetheless, the dataset has adequate attributes to support any unsupervised learning task as well. The dataset comprises 78,767 samples.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Wael Shaher
Released under CC0: Public Domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a detailed sentiment analysis of user reviews for the app, alongside comprehensive app information from the Google Play Store. The first dataset (App_Sentiment_Analysis.csv) includes translated review texts, sentiment classifications, and numerical scores for sentiment polarity and subjectivity, offering insights into user opinions and experiences. The second dataset (Review.csv) covers various attributes of several apps, such as their ratings, review counts, sizes, installation numbers, content ratings, genres, and more. Together, these datasets facilitate a thorough analysis of user feedback and app performance, supporting app recommendation and improvement strategies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AndroR2 is a dataset of 90 manually reproduced bug reports for Android apps listed on Google Play and hosted on GitHub, systematically collected via an in-depth analysis of 459 reports extracted from the GitHub issue tracker. For each reproduced report, AndroR2 includes the original bug report, an apk file for the buggy version of the app, an executable reproduction script, and metadata regarding the quality of the reproduction steps associated with the original report. We believe that the AndroR2 dataset can be used to facilitate research in automatically analyzing, understanding, reproducing, localizing, and fixing bugs for mobile applications as well as other software maintenance activities more broadly.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
mHealth & Mobile Health Apps
Abstract
Synthetic dataset modeling mHealth and mobile health app usage across three SSA scenarios. Captures phone type, channels (app/SMS/USSD/WhatsApp), enrollment, engagement, use cases, adherence improvement, barriers, and satisfaction. Parameterized from SSA mHealth research.
Parameterization Evidence
Parameter Value Source Year
Mobile penetration SSA
80% GSMA/Mechael et al. 2023
SMS improves ART adherence 12-20%… See the full description on the dataset page: https://huggingface.co/datasets/electricsheepafrica/mhealth-mobile-apps.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
The ever-changing mobile landscape is a challenging space to navigate. . The percentage of mobile over desktop is only increasing. Android holds about 53.2% of the smartphone market, while iOS is 43%. To get more people to download your app, you need to make sure they can easily find your app. Mobile app analytics is a great way to understand the existing strategy to drive growth and retention of future user.
With million of apps around nowadays, the following data set has become very key to getting top trending apps in iOS app store. This data set contains more than 7000 Apple iOS mobile application details. The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study.
Interactive full Shiny app can be seen here( https://multiscal.shinyapps.io/appStore/)
Data collection date (from API); July 2017
Dimension of the data set; 7197 rows and 16 columns
"id" : App ID
"track_name": App Name
"size_bytes": Size (in Bytes)
"currency": Currency Type
"price": Price amount
"rating_count_tot": User Rating counts (for all version)
"rating_count_ver": User Rating counts (for current version)
"user_rating" : Average User Rating value (for all version)
"user_rating_ver": Average User Rating value (for current version)
"ver" : Latest version code
"cont_rating": Content Rating
"prime_genre": Primary Genre
"sup_devices.num": Number of supporting devices
"ipadSc_urls.num": Number of screenshots showed for display
"lang.num": Number of supported languages
"vpp_lic": Vpp Device Based Licensing Enabled
The data was extracted from the iTunes Search API at the Apple Inc website. R and linux web scraping tools were used for this study.
Reference: R package
From github, with
devtools::install_github("ramamet/applestoreR")
Copyright (c) 2018 Ramanathan Perumal