70 datasets found

Screen Time and App Usage Dataset (iOS/Android)
kaggle.com
zip
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khushi Yadav (2025). Screen Time and App Usage Dataset (iOS/Android) [Dataset]. https://www.kaggle.com/datasets/khushikyad001/screen-time-and-app-usage-dataset-iosandroid
Explore at:
zip(157038 bytes)Available download formats
Dataset updated
Apr 19, 2025
Authors
Khushi Yadav
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset simulates anonymized mobile screen time and app usage data collected from Android/iOS users over a 3-month period (Jan–April 2024). It captures daily usage trends across various app categories including:

Productivity: Google Docs, Notion, Slack

Entertainment: YouTube, Netflix, TikTok

Social Media: Instagram, WhatsApp, Facebook

Utilities: Chrome, Gmail, Maps

For YouTube, additional engagement statistics such as views, likes, and comments are included to analyze video popularity and content consumption behavior.

The dataset enables exploration of:

Productivity vs. entertainment screen time patterns

Daily usage fluctuations

App-specific user engagement

Correlation between time spent and user interactions

YouTube content virality metrics

This is a great resource for:

EDA projects

Behavioral clustering

Dashboard development

Time series and anomaly detection

Building recommendation or focus-assistive apps
d
Mobile App Usage | 1st Party | 3B+ events verified, US consumers |...
datarade.ai
omnitrafficdata.mfour.com
.csv, .parquet
Updated Dec 13, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MFour (2021). Mobile App Usage | 1st Party | 3B+ events verified, US consumers | Event-level iOS & Android [Dataset]. https://datarade.ai/data-products/mobile-app-usage-1st-party-3b-events-verified-us-consum-mfour
Explore at:
.csv, .parquetAvailable download formats
Dataset updated
Dec 13, 2021
Dataset authored and provided by
MFour
Area covered
United States of America
Description
This dataset encompasses mobile smartphone application (app) usage, collected from over 150,000 triple-opt-in first-party US Daily Active Users (DAU). Use it for measurement, attribution or surveying to understand the why. iOS and Android operating system coverage.

Tie app usage to web and location events using anonymized PanelistID for omnichannel consumer journey understanding.
RICO dataset
kaggle.com
zip
Updated Dec 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Onur Gunes (2021). RICO dataset [Dataset]. https://www.kaggle.com/datasets/onurgunes1993/rico-dataset
Explore at:
zip(6703669364 bytes)Available download formats
Dataset updated
Dec 1, 2021
Authors
Onur Gunes
Description
Context

Data-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.3k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 66k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query-by-example search over UIs.

Content

Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. Like its predecessor ERICA, Rico’s app mining infrastructure requires no access to — or modification of — an app’s source code. Apps are downloaded from the Google Play Store and served to crowd workers through a web interface. When crowd workers use an app, the system records a user interaction trace that captures the UIs visited and the interactions performed on them. Then, an automated agent replays the trace to warm up a new copy of the app and continues the exploration programmatically, leveraging a content-agnostic similarity heuristic to efficiently discover new UI states. By combining crowdsourcing and automation, Rico can achieve higher coverage over an app’s UI states than either crawling strategy alone. In total, 13 workers recruited on UpWork spent 2,450 hours using apps on the platform over five months, producing 10,811 user interaction traces. After collecting a user trace for an app, we ran the automated crawler on the app for one hour.

Acknowledgements

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN https://interactionmining.org/rico

Inspiration

The Rico dataset is large enough to support deep learning applications. We trained an autoencoder to learn an embedding for UI layouts, and used it to annotate each UI with a 64-dimensional vector representation encoding visual layout. This vector representation can be used to compute structurally — and often semantically — similar UIs, supporting example-based search over the dataset. To create training inputs for the autoencoder that embed layout information, we constructed a new image for each UI capturing the bounding box regions of all leaf elements in its view hierarchy, differentiating between text and non-text elements. Rico’s view hierarchies obviate the need for noisy image processing or OCR techniques to create these inputs.
c
Google Play Store Android Apps Dataset in CSV Format
crawlfeeds.com
csv, zip
Updated Nov 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2024). Google Play Store Android Apps Dataset in CSV Format [Dataset]. https://crawlfeeds.com/datasets/google-play-store-apps-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Nov 9, 2024
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Unlock valuable insights with the Google Play Store Android Apps Dataset in CSV format, featuring detailed information on over thousands of Android apps available on the Google Play Store. This comprehensive dataset includes key attributes such as App Name, App Logo, Category, Description, Average Rating, Ratings Count, In-app Purchases, Operating System, Company, Content Rating, Images, Email, Additional Information, and more.

Perfect for market researchers, data scientists, app developers, and analysts, this dataset allows for deep analysis of app performance, user preferences, and industry trends. With data on app descriptions, content ratings, in-app purchases, and company information, you can track trends in the mobile app market, evaluate user satisfaction, and conduct competitive analysis.

The dataset is ideal for businesses looking to optimize app strategies, enhance user experience, and improve app performance based on real user feedback. Easily import the data into your favorite analysis tools to gain actionable insights for your app development or research.

With regularly updated data scraped directly from the Google Play Store, the Google Play Store Android Apps Dataset is an invaluable resource for anyone looking to explore trends, track performance, or enhance their app strategies.
Data from: AndroCT: Ten Years of App Call Traces in Android
zenodo.org
explore.openaire.eu
application/gzip, txt
Updated Mar 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wen Li; Xiaoqin Fu; Haipeng Cai; Haipeng Cai; Wen Li; Xiaoqin Fu (2022). AndroCT: Ten Years of App Call Traces in Android [Dataset]. http://doi.org/10.5281/zenodo.6336104
Explore at:
application/gzip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6336104
Dataset updated
Mar 8, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wen Li; Xiaoqin Fu; Haipeng Cai; Haipeng Cai; Wen Li; Xiaoqin Fu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A large-scale dataset on the dynamic profiles based on function calls of 35,974 benign and malicious Android apps from 10 historical years (2010 through 2019). Function calls are a commonly used means to model program behaviors, which may contribute to various code analysis approaches to assuring software correctness, reliability, and security. In particular, our dataset includes dynamic profiles of each app resulting from the same-length of time (10 mins) of being exercised by randomly generated inputs on both emulator and real device, enabling interesting and useful app analysis that reason about app behaviors in an evolutionary perspective while informing the differences of app behaviors on different run-time hardware platforms. Since we have 20 yearly datasets associated with 35,974 unique Android apps across the 10 years, profiling these apps took 12,000 hours. Considering the costs of filtering out apps that were originally sampled but that we were unable to profile (due to various reasons such as broken APKs, not being executable because of incompatibility issues, not instrumentable, etc.), we took over two years to produce all these traces. We hope to save future researchers' time in producing such a set of dynamic data to enable their empirical and technical work.

==================

Thanks for your interest in our dataset. Collecting this dataset took tremendous computational and human effort. Thus, please observe the following restrictions in using our dataset:

- Do not redistribute this dataset without our consent.
- Do not make commercial usage of this dataset.
- Get a faculty, or someone in a permanent position, to agree and commit to these conditions.
- When publishing your work that uses our dataset, please cite the following MSR 2021 data paper.

@inproceedings{AndroidCT,
title = {AndroCT: Ten Years of App Call Traces in Android},
author = {Wen Li, Xiaoqin Fu, and Haipeng Cai},
booktitle = {The 18th International Conference on Mining Software Repositories (MSR 2021), Data Showcase Track},
year = {2021},
}
User mobile app interaction data
kaggle.com
zip
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Moslemani (2025). User mobile app interaction data [Dataset]. https://www.kaggle.com/datasets/mohamedmoslemani/user-mobile-app-interaction-data/data
Explore at:
zip(6809111 bytes)Available download formats
Dataset updated
Jan 15, 2025
Authors
Mohamed Moslemani
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.

Key Features Included

User & Session Metadata

User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details

Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale

Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties

Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions

Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:

Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.

Usage & Applications

Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).

Important Notes & Disclaimer

Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.
r
Data from: WhatsNextApp: LSTM-based next-app prediction with app usage...
resodate.org
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katerina Katsarou; Geunhye Yu; Felix Beierle (2022). WhatsNextApp: LSTM-based next-app prediction with app usage sequences [Dataset]. http://doi.org/10.14279/depositonce-15575
Explore at:
Unique identifier
https://doi.org/10.14279/depositonce-15575
Dataset updated
May 5, 2022
Dataset provided by
Technische Universität Berlin
DepositOnce
Authors
Katerina Katsarou; Geunhye Yu; Felix Beierle
Description
Next app prediction can help enhance user interface design, pre-loading of apps, and network optimizations. Prior work has explored this topic, utilizing multiple different approaches but challenges like the user cold-start problem, data sparsity, and privacy concerns related to contextual data like location histories, persist. The user cold-start problem occurs when a user has recently registered to the smartphone app system and there is not enough information about his/her preferences and his/her history of smartphone usage. In this work, we try to address the above issues. We introduce WhatsNextApp, an approach based on LSTM (Long Short-Term Memory) networks using sequences of app usage logs. Our approach is inspired by Word Embeddings and treats sequences of app usage logs as sequences of words. We collect a real-life data set consisting of 975 Android users with over 22 million app usage events. We build a generic (user-independent) WhatsNextApp model and the evaluation with our data set shows that it outperforms related studies for existing users where we achieve a recall@8 (recall for the top 8 apps) of 92%. For the user cold-start problem with the 500 most frequent apps, we achieve a recall@8 of 82.7%.

Google Play Store Apps Dataset

kaggle.com

zip

Updated Oct 30, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Yusuf Delikkaya (2024). Google Play Store Apps Dataset [Dataset]. https://www.kaggle.com/datasets/yusufdelikkaya/google-play-store-apps-dataset/code

Explore at:

zip(319016 bytes)Available download formats

Dataset updated

Oct 30, 2024

Authors

Yusuf Delikkaya

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Description:

The dataset comprises anonymized data on apps available on the Google Play Store, capturing various aspects such as ratings, downloads, and categorization.
The dataset has 10,841 entries, with some columns containing missing values, particularly in "Rating," "Type," "Content Rating," "Current Ver," and "Android Ver".
This dataset can be utilized for analyzing trends in mobile app usage, user preferences, and app performance metrics across different categories.
It can aid in understanding the impact of factors like app size, rating, and category on user downloads and popularity.
This dataset can be utilized for analyzing app popularity, user preferences, and the relationship between app features (e.g., size, price) and downloads.
It can help in identifying trends in app categories, assessing user satisfaction through ratings and reviews, and providing insights for app developers and marketers on user engagement and app performance.

Features:

Column Name	Description
App	The name of the app as listed on the Google Play Store.
Category	The category to which the app belongs (e.g., ART_AND_DESIGN, GAME).
Rating	The user rating of the app on a scale from 1 to 5.
Reviews	The number of user reviews for the app.
Size	The size of the app in megabytes (MB) or kilobytes (KB).
Installs	The number of installs/downloads of the app (e.g., 10,000+).
Type	Indicates whether the app is free or paid.
Price	The price of the app in USD, if it is a paid app.
Content Rating	The target audience for the app (e.g., Everyone, Teen, Mature 17+).
Genres	The genres associated with the app (e.g., Art & Design, Creativity).
Last Updated	The date when the app was last updated.
Current Ver	The current version of the app.
Android Ver	The minimum Android version required to run the app.

Mobile Device Usage and User Behavior Dataset
kaggle.com
zip
Updated Sep 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vala khorasani (2024). Mobile Device Usage and User Behavior Dataset [Dataset]. https://www.kaggle.com/datasets/valakhorasani/mobile-device-usage-and-user-behavior-dataset/discussion
Explore at:
zip(11576 bytes)Available download formats
Dataset updated
Sep 28, 2024
Authors
vala khorasani
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset provides a comprehensive analysis of mobile device usage patterns and user behavior classification. It contains 700 samples of user data, including metrics such as app usage time, screen-on time, battery drain, and data consumption. Each entry is categorized into one of five user behavior classes, ranging from light to extreme usage, allowing for insightful analysis and modeling.

Key Features: - User ID: Unique identifier for each user. - Device Model: Model of the user's smartphone. - Operating System: The OS of the device (iOS or Android). - App Usage Time: Daily time spent on mobile applications, measured in minutes. - Screen On Time: Average hours per day the screen is active. - Battery Drain: Daily battery consumption in mAh. - Number of Apps Installed: Total apps available on the device. - Data Usage: Daily mobile data consumption in megabytes. - Age: Age of the user. - Gender: Gender of the user (Male or Female). - User Behavior Class: Classification of user behavior based on usage patterns (1 to 5).

This dataset is ideal for researchers, data scientists, and analysts interested in understanding mobile user behavior and developing predictive models in the realm of mobile technology and applications. This Dataset was primarily designed to implement machine learning algorithms and is not a reliable source for a paper or article.
Z
Dataset used for "A Recommender System of Buggy App Checkers for App Store...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jun 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Gomez; Romain Rouvoy; Martin Monperrus; Lionel Seinturier (2021). Dataset used for "A Recommender System of Buggy App Checkers for App Store Moderators" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5034291
Explore at:
Dataset updated
Jun 28, 2021
Dataset provided by
University of Lille / Inria
Authors
Maria Gomez; Romain Rouvoy; Martin Monperrus; Lionel Seinturier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset used for paper: "A Recommender System of Buggy App Checkers for App Store Moderators", published on the International Conference on Mobile Software Engineering and Systems (MOBILESoft) in 2015.

Dataset Collection We built a dataset that consists of a random sample of Android app metadata and user reviews available on the Google Play Store on January and March 2014. Since the Google Play Store is continuously evolving (adding, removing and/or updating apps), we updated the dataset twice. The dataset D1 contains available apps in the Google Play Store in January 2014. Then, we created a new snapshot (D2) of the Google Play Store in March 2014.

The apps belong to the 27 different categories defined by Google (at the time of writing the paper), and the 4 predefined subcategories (free, paid, new_free, and new_paid). For each category-subcategory pair (e.g. tools-free, tools-paid, sports-new_free, etc.), we collected a maximum of 500 samples, resulting in a median number of 1.978 apps per category.

For each app, we retrieved the following metadata: name, package, creator, version code, version name, number of downloads, size, upload date, star rating, star counting, and the set of permission requests.

In addition, for each app, we collected up to a maximum of the latest 500 reviews posted by users in the Google Play Store. For each review, we retrieved its metadata: title, description, device, and version of the app. None of these fields were mandatory, thus several reviews lack some of these details. From all the reviews attached to an app, we only considered the reviews associated with the latest version of the app —i.e., we discarded unversioned and old-versioned reviews. Thus, resulting in a corpus of 1,402,717 reviews (2014 Jan.).

Dataset Stats Some stats about the datasets:

D1 (Jan. 2014) contains 38,781 apps requesting 7,826 different permissions, and 1,402,717 user reviews.

D2 (Mar. 2014) contains 46,644 apps and 9,319 different permission requests, and 1,361,319 user reviews.

Additional stats about the datasets are available here.

Dataset Description To store the dataset, we created a graph database with Neo4j. This dataset therefore consists of a graph describing the apps as nodes and edges. We chose a graph database because the graph visualization helps to identify connections among data (e.g., clusters of apps sharing similar sets of permission requests).

In particular, our dataset graph contains six types of nodes: - APP nodes containing metadata of each app, - PERMISSION nodes describing permission types, - CATEGORY nodes describing app categories, - SUBCATEGORY nodes describing app subcategories, - USER_REVIEW nodes storing user reviews. - TOPIC topics mined from user reviews (using LDA).

Furthermore, there are five types of relationships between APP nodes and each of the remaining nodes:

USES_PERMISSION relationships between APP and PERMISSION nodes

HAS_REVIEW between APP and USER_REVIEW nodes

HAS_TOPIC between USER_REVIEW and TOPIC nodes

BELONGS_TO_CATEGORY between APP and CATEGORY nodes

BELONGS_TO_SUBCATEGORY between APP and SUBCATEGORY nodes

Dataset Files Info

Neo4j 2.0 Databases

googlePlayDB1-Jan2014_neo4j_2_0.rar

googlePlayDB2-Mar2014_neo4j_2_0.rar We provide two Neo4j databases containing the 2 snapshots of the Google Play Store (January and March 2014). These are the original databases created for the paper. The databases were created with Neo4j 2.0. In particular with the tool version 'Neo4j 2.0.0-M06 Community Edition' (latest version available at the time of implementing the paper in 2014).

Neo4j 3.5 Databases

googlePlayDB1-Jan2014_neo4j_3_5_28.rar

googlePlayDB2-Mar2014_neo4j_3_5_28.rar Currently, the version Neo4j 2.0 is deprecated and it is not available for download in the official Neo4j Download Center. We have migrated the original databases (Neo4j 2.0) to Neo4j 3.5.28. The databases can be opened with the tool version: 'Neo4j Community Edition 3.5.28'. The tool can be downloaded from the official Neo4j Donwload page.

In order to open the databases with more recent versions of Neo4j, the databases must be first migrated to the corresponding version. Instructions about the migration process can be found in the Neo4j Migration Guide. First time the Neo4j database is connected, it could request credentials. The username and pasword are: neo4j/neo4j
DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps...
zenodo.org
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Ferrante; Alberto Ferrante; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic (2020). DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps [Dataset]. http://doi.org/10.5281/zenodo.1296278
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.1296278
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alberto Ferrante; Alberto Ferrante; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic
Description
Protection against malware is particularly relevant on systems running the Android operating system, due to its huge use base and, therefore, its potential for monetization from the attackers.

Protection against malware is particularly relevant in systems running the Android operating system, due to its huge users’ base and, therefore, its potential for monetization from the attackers.

Dynamic malware detection has been widely adopted by the scientific community but not yet in practical applications.

We release DYNAMISM (Dynamic Analysis of Malware), a dataset containing execution traces of both benign and malicious applications running on Android OS, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 2,386 benign applications and 2,495 malicious applications taken from the Malware Genome Project repository [http://www.malgenomeproject.org] and from Drebin Dataset [https://www.sec.cs.tu-bs.de/~danarp/drebin/]. Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 2,000 stimuli were applied with a maximum execution time of 10 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.

In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:

benign-per_app-csv.zip - features obtained by executing benign applications, one CSV per application

benign-unified-csv.zip - features obtained by executing benign applications, only one CSV file

malicious-per_app-csv.zip - features obtained by executing malicious applications, one CSV per application

malicious-unified-csv.zip - features obtained by executing malicious applications, only one CSV file
m
Data from: A dataset from the daily use of features in Android devices
data.mendeley.com
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edwin Monteiro (2024). A dataset from the daily use of features in Android devices [Dataset]. http://doi.org/10.17632/bpsrw76hgx.1
Explore at:
Unique identifier
https://doi.org/10.17632/bpsrw76hgx.1
Dataset updated
Feb 16, 2024
Authors
Edwin Monteiro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The energy consumption of Android devices, measured via data collection from features, is a recurring theme in the literature. To evaluate the performance of such devices, databases are generated through the collection data from features while using the Android operating system. This article describes a database generated from the daily use of smartphones and tablets while performing everyday tasks. The dataset contains 98 features and 10,517,165 of records related to dynamic, background, app list and static data. Device records were collected every day from ten distinct devices and stored in CSV files that were later organized to generate a database by cleaning and preprocessing the data that are publically available in the Mendeley Data Repository.
Z
Data from: Hall-of-Apps: The Top Android Apps Metadata Archive
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
Updated Mar 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Bello-Jiménez; Camilo Escobar-Velásquez; Anamaria Mojica-Hanke; Santiago Cortés-Fernandéz; Mario Linares-Vásquez (2020). Hall-of-Apps: The Top Android Apps Metadata Archive [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3653366
Explore at:
Dataset updated
Mar 20, 2020
Dataset provided by
Universidad de los Andes
Authors
Laura Bello-Jiménez; Camilo Escobar-Velásquez; Anamaria Mojica-Hanke; Santiago Cortés-Fernandéz; Mario Linares-Vásquez
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The amount of Android apps available for download is constantly increasing, exerting a continuous pressure on developers to publish outstanding apps. Google Play (GP) is the default distribution channel for Android apps, which provides mobile app users with metrics to identify and report apps quality such as rating, amount of downloads, previous users comments, etc. In addition to those metrics, GP presents a set of top charts that highlight the outstanding apps in different categories. Both metrics and top app charts help developers to identify whether their development decisions are well valued by the community. Therefore, app presence in these top charts is a valuable information when understanding the features of top-apps. In this paper we present Hall-of-Apps, a dataset containing top charts' apps metadata extracted (weekly) from GP, for 4 different countries, during 30 weeks. The data is presented as (i) raw HTML files, (ii) a MongoDB database with all the information contained in app's HTML files (e.g., app description, category, general rating, etc.), and (iii) data visualizations built with the D3.js framework. A first characterization of the data along with the urls to retrieve it can be found in our online appendix: https://thesoftwaredesignlab.github.io/hall-of-apps-tools/
m
COVID-19 Android apps data set
data.mendeley.com
Updated May 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arvind Mahindru (2020). COVID-19 Android apps data set [Dataset]. http://doi.org/10.17632/k4rt99sfbt.1
Explore at:
Unique identifier
https://doi.org/10.17632/k4rt99sfbt.1
Dataset updated
May 4, 2020
Authors
Arvind Mahindru
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set consist of Android apps which are developed during the current time when the whole world face the pandemic situation. Cybercriminals are developing malware-infected apps on a regular basis and upload them to different repositories.
Android malware dataset for machine learning 2
figshare.com
txt
Updated Nov 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suleiman Yerima (2025). Android malware dataset for machine learning 2 [Dataset]. http://doi.org/10.6084/m9.figshare.5854653.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5854653.v1
Dataset updated
Nov 26, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Suleiman Yerima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset consisting of feature vectors of 215 attributes extracted from 15,036 applications (5,560 malware apps from Drebin project and 9,476 benign apps). The dataset has been used to develop and evaluate multilevel classifier fusion approach for Android malware detection, published in the IEEE Transactions on Cybernetics paper 'DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection'. The supporting file contains further description of the feature vectors/attributes obtained via static code analysis of the Android apps.
m
PARUDroid
data.mendeley.com
Updated May 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arvind Mahindru (2020). PARUDroid [Dataset]. http://doi.org/10.17632/mg5c8jxbhm.1
Explore at:
Unique identifier
https://doi.org/10.17632/mg5c8jxbhm.1
Dataset updated
May 16, 2020
Authors
Arvind Mahindru
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains Android apps collected from Google play and promised repositories.
Android Malware Detection Dataset
kaggle.com
zip
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danny Revaldo (2024). Android Malware Detection Dataset [Dataset]. https://www.kaggle.com/datasets/dannyrevaldo/android-malware-detection-dataset
Explore at:
zip(123470 bytes)Available download formats
Dataset updated
Feb 24, 2024
Authors
Danny Revaldo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The "Android Malware Detection Dataset" is a comprehensive collection of data designed to facilitate research in the detection and analysis of malware targeting the Android platform. This dataset encompasses a wide range of features extracted from Android applications, providing valuable insights into their behaviors and functionalities.

Key features of the dataset include:

Permission Features: Various permissions requested by Android applications, such as access to location (coarse and fine), camera, microphone, contacts, SMS, calendar, storage, and more.

System Features: Features related to system functions and controls, including access to device hardware (e.g., sensors, Bluetooth, NFC), system settings (e.g., changing network state, WiFi settings), and system services (e.g., managing accounts, managing documents).

Security-related Features: Features related to security functionalities and behaviors, encompassing permission management, authentication, encryption (e.g., cryptographic operations), and security policy enforcement.

Communication Features: Features related to communication functionalities, including sending and receiving SMS messages, making phone calls, accessing network state, and managing network connections.

Data Access Features: Features related to accessing and manipulating data, such as reading and writing to various data sources (e.g., external storage, databases), accessing user information (e.g., contacts, call logs), and accessing app-specific data.

App Lifecycle Features: Features related to managing the application lifecycle, including app installation and uninstallation, app startup and shutdown, app updates, and app permissions.

Device Control Features: Features related to controlling device behavior and settings, such as changing system settings, modifying audio settings, controlling device display, and managing device power.

Miscellaneous Features: Other miscellaneous features including accessing system logs, system services and components (e.g., camera, location manager), handling system events (e.g., incoming calls, boot completed), and interacting with system UI components.

This dataset provides researchers with a rich source of information to develop and evaluate effective malware detection and analysis techniques, ultimately contributing to the enhancement of mobile security on the Android platform.
m
ITC-Net-Blend-60: A Comprehensive Dataset for Robust Mobile App...
data.mendeley.com
Updated Nov 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marziyeh Bayat (2023). ITC-Net-Blend-60: A Comprehensive Dataset for Robust Mobile App Identification in Real-World Network Environment - Scenario A [Dataset]. http://doi.org/10.17632/ssv23kfcgs.1
Explore at:
Unique identifier
https://doi.org/10.17632/ssv23kfcgs.1
Dataset updated
Nov 15, 2023
Authors
Marziyeh Bayat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
This dataset includes network traffic data from more than 50 Android applications across 5 different scenarios. The applications are consistent in all scenarios, but other factors like location, device, and user vary (see Table 2 in the paper). The current repository pertains to Scenario A. Within the repository, for each application, there is a compressed file containing the relevant PCAP files. The PCAP files follow the naming convention: {Application Name}{Scenario ID}{#Trace}_Final.pcap.
d
Install Data APAC - Installed Apps (1st Party Data w/90M records)
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI Keyboard, Install Data APAC - Installed Apps (1st Party Data w/90M records) [Dataset]. https://datarade.ai/data-products/1st-party-data-app-usage-installed-apps-app-session-bobble-ai
Explore at:
.json, .csv, .xls, .parquetAvailable download formats
Dataset authored and provided by
AI Keyboard
Area covered
Nepal, Bangladesh, Brazil, Oman, Germany, Pakistan, Netherlands, United Arab Emirates, Philippines, France
Description
Install App dataset provides comprehensive, first-party app install intelligence across the APAC region, sourced from AI-driven OS-level keyboard and utility applications. It captures highly granular insights into mobile app installations, updates, and user behavior, enabling precise market analytics, attribution tracking, and growth optimization.

Each record includes hashed device and advertising identifiers, application metadata (package name, app version, category), and timestamped install/update events. The field is_new_install indicates whether the app installation is first-time or an existing reinstall/update, helping distinguish between new user acquisition and returning user activity — a critical signal for campaign performance and user lifecycle analytics.

Alongside app-level insights, the dataset provides detailed device intelligence — including manufacturer, model, OS type/version, language, and user agent — combined with IP-based location data (country, region, city) and daily server timestamps for freshness tracking.

All data is hashed, privacy-compliant, and refreshed daily, making it ideal for organizations seeking high-quality, real-world app install signals across Android and iOS ecosystems.

📊 Key Features • First-party, consented data from OS-level applications • Hashed identifiers (device_id, advertising_id) for privacy-safe integration • Install and update timestamps for temporal and behavioral analysis • is_new_install flag to separate new installs from reinstalls or app updates • Comprehensive app, device, and location attributes • Daily refreshed dataset ensuring data accuracy and timeliness

⚙️ Primary Use Cases • Mobile Attribution & User Acquisition Tracking – Identify new users vs. re-engaged ones via the is_new_install flag • Market Intelligence & Competitive Benchmarking – Analyze install trends across app categories and geographies • Audience Segmentation – Classify users by device type, OS version, and app install behavior • Ad Targeting Optimization – Refine lookalike and re-engagement audiences with verified install data • Product & Growth Analytics – Study retention, uninstall rates, and user churn patterns • App Store Strategy – Evaluate app update frequency and version distribution

📍 Industries Benefiting • Ad-Tech & Mar-Tech Platforms • Mobile App Publishers & Developers • Telecom Operators & Device OEMs • Market Research & Analytics Firms • E-commerce, Fintech & Gaming Companies • Media, Entertainment & OTT Platforms

With millions of verified app installs tracked across Android and iOS, this AI-powered, consent-based dataset delivers actionable insights into app discovery, engagement, and retention, driving smarter decisions in mobile marketing, audience intelligence, and growth analytics.
TUApps
zenodo.org
zip
Updated May 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous Anonymous; Anonymous Anonymous (2024). TUApps [Dataset]. http://doi.org/10.5281/zenodo.11201267
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11201267
Dataset updated
May 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous Anonymous; Anonymous Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To research the illegal activities of underground apps on Telegram, we have created a dataset called TUApps. TUApps is a progressively growing dataset of underground apps, collected from September 2023 to February 2024, consisting of a total of 1,000 underground apps and 200 million messages distributed across 71,332 Telegram channels.
In the process of creating this dataset, we followed strict ethical standards to ensure the lawful use of the data and the protection of user privacy. The dataset includes the following files:
(1) dataset.zip: We have packaged the underground app samples. The naming of Android app files is based on the SHA256 hash of the file, and the naming of iOS app files is based on the SHA256 hash of the publishing webpage.
(2) code.zip: We have packaged the code used for crawling data from Telegram and for performing data analysis.
(3) message.zip: We have packaged the messages crawled from Telegram, the files are named after the names of the channels in Telegram.
Availability of code and messages
Upon acceptance of our research paper, the dataset containing user messages and the code used for data collection and analysis will only be made available upon request to researchers who agree to adhere to strict ethical principles and maintain the confidentiality of the data.

Facebook

Twitter

Click to copy link

Link copied

Cite

Khushi Yadav (2025). Screen Time and App Usage Dataset (iOS/Android) [Dataset]. https://www.kaggle.com/datasets/khushikyad001/screen-time-and-app-usage-dataset-iosandroid

Screen Time and App Usage Dataset (iOS/Android)

Track app usage trends with focus on productivity vs. entertainment

Explore at:

zip(157038 bytes)Available download formats

Dataset updated

Apr 19, 2025

Authors

Khushi Yadav

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset simulates anonymized mobile screen time and app usage data collected from Android/iOS users over a 3-month period (Jan–April 2024). It captures daily usage trends across various app categories including:

Productivity: Google Docs, Notion, Slack

Entertainment: YouTube, Netflix, TikTok

Social Media: Instagram, WhatsApp, Facebook

Utilities: Chrome, Gmail, Maps

For YouTube, additional engagement statistics such as views, likes, and comments are included to analyze video popularity and content consumption behavior.

The dataset enables exploration of:

Productivity vs. entertainment screen time patterns

Daily usage fluctuations

App-specific user engagement

Correlation between time spent and user interactions

YouTube content virality metrics

This is a great resource for:

EDA projects

Behavioral clustering

Dashboard development

Time series and anomaly detection

Building recommendation or focus-assistive apps

Clear search

Close search

Google apps

Main menu

Screen Time and App Usage Dataset (iOS/Android)

Mobile App Usage | 1st Party | 3B+ events verified, US consumers |...

RICO dataset

Context

Content

Acknowledgements

Inspiration

Google Play Store Android Apps Dataset in CSV Format

Data from: AndroCT: Ten Years of App Call Traces in Android

User mobile app interaction data

Key Features Included

Usage & Applications

Important Notes & Disclaimer

Data from: WhatsNextApp: LSTM-based next-app prediction with app usage...

Google Play Store Apps Dataset

Description:

Features:

Mobile Device Usage and User Behavior Dataset

Dataset used for "A Recommender System of Buggy App Checkers for App Store...

DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps...

Data from: A dataset from the daily use of features in Android devices

Data from: Hall-of-Apps: The Top Android Apps Metadata Archive

COVID-19 Android apps data set

Android malware dataset for machine learning 2

PARUDroid

Android Malware Detection Dataset

ITC-Net-Blend-60: A Comprehensive Dataset for Robust Mobile App...

Install Data APAC - Installed Apps (1st Party Data w/90M records)

TUApps

Screen Time and App Usage Dataset (iOS/Android)

Track app usage trends with focus on productivity vs. entertainment