70 datasets found
  1. Screen Time and App Usage Dataset (iOS/Android)

    • kaggle.com
    zip
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khushi Yadav (2025). Screen Time and App Usage Dataset (iOS/Android) [Dataset]. https://www.kaggle.com/datasets/khushikyad001/screen-time-and-app-usage-dataset-iosandroid
    Explore at:
    zip(157038 bytes)Available download formats
    Dataset updated
    Apr 19, 2025
    Authors
    Khushi Yadav
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset simulates anonymized mobile screen time and app usage data collected from Android/iOS users over a 3-month period (Jan–April 2024). It captures daily usage trends across various app categories including:

    Productivity: Google Docs, Notion, Slack

    Entertainment: YouTube, Netflix, TikTok

    Social Media: Instagram, WhatsApp, Facebook

    Utilities: Chrome, Gmail, Maps

    For YouTube, additional engagement statistics such as views, likes, and comments are included to analyze video popularity and content consumption behavior.

    The dataset enables exploration of:

    Productivity vs. entertainment screen time patterns

    Daily usage fluctuations

    App-specific user engagement

    Correlation between time spent and user interactions

    YouTube content virality metrics

    This is a great resource for:

    EDA projects

    Behavioral clustering

    Dashboard development

    Time series and anomaly detection

    Building recommendation or focus-assistive apps

  2. d

    Mobile App Usage | 1st Party | 3B+ events verified, US consumers |...

    • datarade.ai
    • omnitrafficdata.mfour.com
    .csv, .parquet
    Updated Dec 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MFour (2021). Mobile App Usage | 1st Party | 3B+ events verified, US consumers | Event-level iOS & Android [Dataset]. https://datarade.ai/data-products/mobile-app-usage-1st-party-3b-events-verified-us-consum-mfour
    Explore at:
    .csv, .parquetAvailable download formats
    Dataset updated
    Dec 13, 2021
    Dataset authored and provided by
    MFour
    Area covered
    United States of America
    Description

    This dataset encompasses mobile smartphone application (app) usage, collected from over 150,000 triple-opt-in first-party US Daily Active Users (DAU). Use it for measurement, attribution or surveying to understand the why. iOS and Android operating system coverage.

    Tie app usage to web and location events using anonymized PanelistID for omnichannel consumer journey understanding.

  3. RICO dataset

    • kaggle.com
    zip
    Updated Dec 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Onur Gunes (2021). RICO dataset [Dataset]. https://www.kaggle.com/datasets/onurgunes1993/rico-dataset
    Explore at:
    zip(6703669364 bytes)Available download formats
    Dataset updated
    Dec 1, 2021
    Authors
    Onur Gunes
    Description

    Context

    Data-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.3k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 66k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query-by-example search over UIs.

    Content

    Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. Like its predecessor ERICA, Rico’s app mining infrastructure requires no access to — or modification of — an app’s source code. Apps are downloaded from the Google Play Store and served to crowd workers through a web interface. When crowd workers use an app, the system records a user interaction trace that captures the UIs visited and the interactions performed on them. Then, an automated agent replays the trace to warm up a new copy of the app and continues the exploration programmatically, leveraging a content-agnostic similarity heuristic to efficiently discover new UI states. By combining crowdsourcing and automation, Rico can achieve higher coverage over an app’s UI states than either crawling strategy alone. In total, 13 workers recruited on UpWork spent 2,450 hours using apps on the platform over five months, producing 10,811 user interaction traces. After collecting a user trace for an app, we ran the automated crawler on the app for one hour.

    Acknowledgements

    UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN https://interactionmining.org/rico

    Inspiration

    The Rico dataset is large enough to support deep learning applications. We trained an autoencoder to learn an embedding for UI layouts, and used it to annotate each UI with a 64-dimensional vector representation encoding visual layout. This vector representation can be used to compute structurally — and often semantically — similar UIs, supporting example-based search over the dataset. To create training inputs for the autoencoder that embed layout information, we constructed a new image for each UI capturing the bounding box regions of all leaf elements in its view hierarchy, differentiating between text and non-text elements. Rico’s view hierarchies obviate the need for noisy image processing or OCR techniques to create these inputs.

  4. c

    Google Play Store Android Apps Dataset in CSV Format

    • crawlfeeds.com
    csv, zip
    Updated Nov 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). Google Play Store Android Apps Dataset in CSV Format [Dataset]. https://crawlfeeds.com/datasets/google-play-store-apps-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Nov 9, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock valuable insights with the Google Play Store Android Apps Dataset in CSV format, featuring detailed information on over thousands of Android apps available on the Google Play Store. This comprehensive dataset includes key attributes such as App Name, App Logo, Category, Description, Average Rating, Ratings Count, In-app Purchases, Operating System, Company, Content Rating, Images, Email, Additional Information, and more.

    Perfect for market researchers, data scientists, app developers, and analysts, this dataset allows for deep analysis of app performance, user preferences, and industry trends. With data on app descriptions, content ratings, in-app purchases, and company information, you can track trends in the mobile app market, evaluate user satisfaction, and conduct competitive analysis.

    The dataset is ideal for businesses looking to optimize app strategies, enhance user experience, and improve app performance based on real user feedback. Easily import the data into your favorite analysis tools to gain actionable insights for your app development or research.

    With regularly updated data scraped directly from the Google Play Store, the Google Play Store Android Apps Dataset is an invaluable resource for anyone looking to explore trends, track performance, or enhance their app strategies.

  5. Data from: AndroCT: Ten Years of App Call Traces in Android

    • zenodo.org
    • explore.openaire.eu
    application/gzip, txt
    Updated Mar 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wen Li; Xiaoqin Fu; Haipeng Cai; Haipeng Cai; Wen Li; Xiaoqin Fu (2022). AndroCT: Ten Years of App Call Traces in Android [Dataset]. http://doi.org/10.5281/zenodo.6336104
    Explore at:
    application/gzip, txtAvailable download formats
    Dataset updated
    Mar 8, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wen Li; Xiaoqin Fu; Haipeng Cai; Haipeng Cai; Wen Li; Xiaoqin Fu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A large-scale dataset on the dynamic profiles based on function calls of 35,974 benign and malicious Android apps from 10 historical years (2010 through 2019). Function calls are a commonly used means to model program behaviors, which may contribute to various code analysis approaches to assuring software correctness, reliability, and security. In particular, our dataset includes dynamic profiles of each app resulting from the same-length of time (10 mins) of being exercised by randomly generated inputs on both emulator and real device, enabling interesting and useful app analysis that reason about app behaviors in an evolutionary perspective while informing the differences of app behaviors on different run-time hardware platforms. Since we have 20 yearly datasets associated with 35,974 unique Android apps across the 10 years, profiling these apps took 12,000 hours. Considering the costs of filtering out apps that were originally sampled but that we were unable to profile (due to various reasons such as broken APKs, not being executable because of incompatibility issues, not instrumentable, etc.), we took over two years to produce all these traces. We hope to save future researchers' time in producing such a set of dynamic data to enable their empirical and technical work.

    ==================

    Thanks for your interest in our dataset. Collecting this dataset took tremendous computational and human effort. Thus, please observe the following restrictions in using our dataset:

    - Do not redistribute this dataset without our consent.
    - Do not make commercial usage of this dataset.
    - Get a faculty, or someone in a permanent position, to agree and commit to these conditions.
    - When publishing your work that uses our dataset, please cite the following MSR 2021 data paper.


    @inproceedings{AndroidCT,
    title = {AndroCT: Ten Years of App Call Traces in Android},
    author = {Wen Li, Xiaoqin Fu, and Haipeng Cai},
    booktitle = {The 18th International Conference on Mining Software Repositories (MSR 2021), Data Showcase Track},
    year = {2021},
    }

  6. User mobile app interaction data

    • kaggle.com
    zip
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Moslemani (2025). User mobile app interaction data [Dataset]. https://www.kaggle.com/datasets/mohamedmoslemani/user-mobile-app-interaction-data/data
    Explore at:
    zip(6809111 bytes)Available download formats
    Dataset updated
    Jan 15, 2025
    Authors
    Mohamed Moslemani
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.

    Key Features Included

    User & Session Metadata

    User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details

    Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale

    Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties

    Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions

    Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:

    Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.

    Usage & Applications

    Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).

    Important Notes & Disclaimer

    Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.

  7. r

    Data from: WhatsNextApp: LSTM-based next-app prediction with app usage...

    • resodate.org
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katerina Katsarou; Geunhye Yu; Felix Beierle (2022). WhatsNextApp: LSTM-based next-app prediction with app usage sequences [Dataset]. http://doi.org/10.14279/depositonce-15575
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Katerina Katsarou; Geunhye Yu; Felix Beierle
    Description

    Next app prediction can help enhance user interface design, pre-loading of apps, and network optimizations. Prior work has explored this topic, utilizing multiple different approaches but challenges like the user cold-start problem, data sparsity, and privacy concerns related to contextual data like location histories, persist. The user cold-start problem occurs when a user has recently registered to the smartphone app system and there is not enough information about his/her preferences and his/her history of smartphone usage. In this work, we try to address the above issues. We introduce WhatsNextApp, an approach based on LSTM (Long Short-Term Memory) networks using sequences of app usage logs. Our approach is inspired by Word Embeddings and treats sequences of app usage logs as sequences of words. We collect a real-life data set consisting of 975 Android users with over 22 million app usage events. We build a generic (user-independent) WhatsNextApp model and the evaluation with our data set shows that it outperforms related studies for existing users where we achieve a recall@8 (recall for the top 8 apps) of 92%. For the user cold-start problem with the 500 most frequent apps, we achieve a recall@8 of 82.7%.

  8. Google Play Store Apps Dataset

    • kaggle.com
    zip
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yusuf Delikkaya (2024). Google Play Store Apps Dataset [Dataset]. https://www.kaggle.com/datasets/yusufdelikkaya/google-play-store-apps-dataset/code
    Explore at:
    zip(319016 bytes)Available download formats
    Dataset updated
    Oct 30, 2024
    Authors
    Yusuf Delikkaya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    • The dataset comprises anonymized data on apps available on the Google Play Store, capturing various aspects such as ratings, downloads, and categorization.
    • The dataset has 10,841 entries, with some columns containing missing values, particularly in "Rating," "Type," "Content Rating," "Current Ver," and "Android Ver".
    • This dataset can be utilized for analyzing trends in mobile app usage, user preferences, and app performance metrics across different categories.
    • It can aid in understanding the impact of factors like app size, rating, and category on user downloads and popularity.
    • This dataset can be utilized for analyzing app popularity, user preferences, and the relationship between app features (e.g., size, price) and downloads.
    • It can help in identifying trends in app categories, assessing user satisfaction through ratings and reviews, and providing insights for app developers and marketers on user engagement and app performance.

    Features:

    Column NameDescription
    AppThe name of the app as listed on the Google Play Store.
    CategoryThe category to which the app belongs (e.g., ART_AND_DESIGN, GAME).
    RatingThe user rating of the app on a scale from 1 to 5.
    ReviewsThe number of user reviews for the app.
    SizeThe size of the app in megabytes (MB) or kilobytes (KB).
    InstallsThe number of installs/downloads of the app (e.g., 10,000+).
    TypeIndicates whether the app is free or paid.
    PriceThe price of the app in USD, if it is a paid app.
    Content RatingThe target audience for the app (e.g., Everyone, Teen, Mature 17+).
    GenresThe genres associated with the app (e.g., Art & Design, Creativity).
    Last UpdatedThe date when the app was last updated.
    Current VerThe current version of the app.
    Android VerThe minimum Android version required to run the app.
  9. Mobile Device Usage and User Behavior Dataset

    • kaggle.com
    zip
    Updated Sep 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vala khorasani (2024). Mobile Device Usage and User Behavior Dataset [Dataset]. https://www.kaggle.com/datasets/valakhorasani/mobile-device-usage-and-user-behavior-dataset/discussion
    Explore at:
    zip(11576 bytes)Available download formats
    Dataset updated
    Sep 28, 2024
    Authors
    vala khorasani
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides a comprehensive analysis of mobile device usage patterns and user behavior classification. It contains 700 samples of user data, including metrics such as app usage time, screen-on time, battery drain, and data consumption. Each entry is categorized into one of five user behavior classes, ranging from light to extreme usage, allowing for insightful analysis and modeling.

    Key Features: - User ID: Unique identifier for each user. - Device Model: Model of the user's smartphone. - Operating System: The OS of the device (iOS or Android). - App Usage Time: Daily time spent on mobile applications, measured in minutes. - Screen On Time: Average hours per day the screen is active. - Battery Drain: Daily battery consumption in mAh. - Number of Apps Installed: Total apps available on the device. - Data Usage: Daily mobile data consumption in megabytes. - Age: Age of the user. - Gender: Gender of the user (Male or Female). - User Behavior Class: Classification of user behavior based on usage patterns (1 to 5).

    This dataset is ideal for researchers, data scientists, and analysts interested in understanding mobile user behavior and developing predictive models in the realm of mobile technology and applications. This Dataset was primarily designed to implement machine learning algorithms and is not a reliable source for a paper or article.

  10. Z

    Dataset used for "A Recommender System of Buggy App Checkers for App Store...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jun 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Gomez; Romain Rouvoy; Martin Monperrus; Lionel Seinturier (2021). Dataset used for "A Recommender System of Buggy App Checkers for App Store Moderators" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5034291
    Explore at:
    Dataset updated
    Jun 28, 2021
    Dataset provided by
    University of Lille / Inria
    Authors
    Maria Gomez; Romain Rouvoy; Martin Monperrus; Lionel Seinturier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset used for paper: "A Recommender System of Buggy App Checkers for App Store Moderators", published on the International Conference on Mobile Software Engineering and Systems (MOBILESoft) in 2015.

    Dataset Collection We built a dataset that consists of a random sample of Android app metadata and user reviews available on the Google Play Store on January and March 2014. Since the Google Play Store is continuously evolving (adding, removing and/or updating apps), we updated the dataset twice. The dataset D1 contains available apps in the Google Play Store in January 2014. Then, we created a new snapshot (D2) of the Google Play Store in March 2014.

    The apps belong to the 27 different categories defined by Google (at the time of writing the paper), and the 4 predefined subcategories (free, paid, new_free, and new_paid). For each category-subcategory pair (e.g. tools-free, tools-paid, sports-new_free, etc.), we collected a maximum of 500 samples, resulting in a median number of 1.978 apps per category.

    For each app, we retrieved the following metadata: name, package, creator, version code, version name, number of downloads, size, upload date, star rating, star counting, and the set of permission requests.

    In addition, for each app, we collected up to a maximum of the latest 500 reviews posted by users in the Google Play Store. For each review, we retrieved its metadata: title, description, device, and version of the app. None of these fields were mandatory, thus several reviews lack some of these details. From all the reviews attached to an app, we only considered the reviews associated with the latest version of the app —i.e., we discarded unversioned and old-versioned reviews. Thus, resulting in a corpus of 1,402,717 reviews (2014 Jan.).

    Dataset Stats Some stats about the datasets:

    • D1 (Jan. 2014) contains 38,781 apps requesting 7,826 different permissions, and 1,402,717 user reviews.

    • D2 (Mar. 2014) contains 46,644 apps and 9,319 different permission requests, and 1,361,319 user reviews.

    Additional stats about the datasets are available here.

    Dataset Description To store the dataset, we created a graph database with Neo4j. This dataset therefore consists of a graph describing the apps as nodes and edges. We chose a graph database because the graph visualization helps to identify connections among data (e.g., clusters of apps sharing similar sets of permission requests).

    In particular, our dataset graph contains six types of nodes: - APP nodes containing metadata of each app, - PERMISSION nodes describing permission types, - CATEGORY nodes describing app categories, - SUBCATEGORY nodes describing app subcategories, - USER_REVIEW nodes storing user reviews. - TOPIC topics mined from user reviews (using LDA).

    Furthermore, there are five types of relationships between APP nodes and each of the remaining nodes:

    • USES_PERMISSION relationships between APP and PERMISSION nodes
    • HAS_REVIEW between APP and USER_REVIEW nodes
    • HAS_TOPIC between USER_REVIEW and TOPIC nodes
    • BELONGS_TO_CATEGORY between APP and CATEGORY nodes
    • BELONGS_TO_SUBCATEGORY between APP and SUBCATEGORY nodes

    Dataset Files Info

    Neo4j 2.0 Databases

    googlePlayDB1-Jan2014_neo4j_2_0.rar

    googlePlayDB2-Mar2014_neo4j_2_0.rar We provide two Neo4j databases containing the 2 snapshots of the Google Play Store (January and March 2014). These are the original databases created for the paper. The databases were created with Neo4j 2.0. In particular with the tool version 'Neo4j 2.0.0-M06 Community Edition' (latest version available at the time of implementing the paper in 2014).

    Neo4j 3.5 Databases

    googlePlayDB1-Jan2014_neo4j_3_5_28.rar

    googlePlayDB2-Mar2014_neo4j_3_5_28.rar Currently, the version Neo4j 2.0 is deprecated and it is not available for download in the official Neo4j Download Center. We have migrated the original databases (Neo4j 2.0) to Neo4j 3.5.28. The databases can be opened with the tool version: 'Neo4j Community Edition 3.5.28'. The tool can be downloaded from the official Neo4j Donwload page.

      In order to open the databases with more recent versions of Neo4j, the databases must be first migrated to the corresponding version. Instructions about the migration process can be found in the Neo4j Migration Guide.
    
      First time the Neo4j database is connected, it could request credentials. The username and pasword are: neo4j/neo4j
    
  11. DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Ferrante; Alberto Ferrante; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic (2020). DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps [Dataset]. http://doi.org/10.5281/zenodo.1296278
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alberto Ferrante; Alberto Ferrante; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic
    Description

    Protection against malware is particularly relevant on systems running the Android operating system, due to its huge use base and, therefore, its potential for monetization from the attackers.

    Protection against malware is particularly relevant in systems running the Android operating system, due to its huge users’ base and, therefore, its potential for monetization from the attackers.

    Dynamic malware detection has been widely adopted by the scientific community but not yet in practical applications.

    We release DYNAMISM (Dynamic Analysis of Malware), a dataset containing execution traces of both benign and malicious applications running on Android OS, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 2,386 benign applications and 2,495 malicious applications taken from the Malware Genome Project repository [http://www.malgenomeproject.org] and from Drebin Dataset [https://www.sec.cs.tu-bs.de/~danarp/drebin/]. Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 2,000 stimuli were applied with a maximum execution time of 10 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.

    In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:

    • benign-per_app-csv.zip - features obtained by executing benign applications, one CSV per application

    • benign-unified-csv.zip - features obtained by executing benign applications, only one CSV file

    • malicious-per_app-csv.zip - features obtained by executing malicious applications, one CSV per application

    • malicious-unified-csv.zip - features obtained by executing malicious applications, only one CSV file

  12. m

    Data from: A dataset from the daily use of features in Android devices

    • data.mendeley.com
    Updated Feb 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edwin Monteiro (2024). A dataset from the daily use of features in Android devices [Dataset]. http://doi.org/10.17632/bpsrw76hgx.1
    Explore at:
    Dataset updated
    Feb 16, 2024
    Authors
    Edwin Monteiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The energy consumption of Android devices, measured via data collection from features, is a recurring theme in the literature. To evaluate the performance of such devices, databases are generated through the collection data from features while using the Android operating system. This article describes a database generated from the daily use of smartphones and tablets while performing everyday tasks. The dataset contains 98 features and 10,517,165 of records related to dynamic, background, app list and static data. Device records were collected every day from ten distinct devices and stored in CSV files that were later organized to generate a database by cleaning and preprocessing the data that are publically available in the Mendeley Data Repository.

  13. Z

    Data from: Hall-of-Apps: The Top Android Apps Metadata Archive

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Mar 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Bello-Jiménez; Camilo Escobar-Velásquez; Anamaria Mojica-Hanke; Santiago Cortés-Fernandéz; Mario Linares-Vásquez (2020). Hall-of-Apps: The Top Android Apps Metadata Archive [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3653366
    Explore at:
    Dataset updated
    Mar 20, 2020
    Dataset provided by
    Universidad de los Andes
    Authors
    Laura Bello-Jiménez; Camilo Escobar-Velásquez; Anamaria Mojica-Hanke; Santiago Cortés-Fernandéz; Mario Linares-Vásquez
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The amount of Android apps available for download is constantly increasing, exerting a continuous pressure on developers to publish outstanding apps. Google Play (GP) is the default distribution channel for Android apps, which provides mobile app users with metrics to identify and report apps quality such as rating, amount of downloads, previous users comments, etc. In addition to those metrics, GP presents a set of top charts that highlight the outstanding apps in different categories. Both metrics and top app charts help developers to identify whether their development decisions are well valued by the community. Therefore, app presence in these top charts is a valuable information when understanding the features of top-apps. In this paper we present Hall-of-Apps, a dataset containing top charts' apps metadata extracted (weekly) from GP, for 4 different countries, during 30 weeks. The data is presented as (i) raw HTML files, (ii) a MongoDB database with all the information contained in app's HTML files (e.g., app description, category, general rating, etc.), and (iii) data visualizations built with the D3.js framework. A first characterization of the data along with the urls to retrieve it can be found in our online appendix: https://thesoftwaredesignlab.github.io/hall-of-apps-tools/

  14. m

    COVID-19 Android apps data set

    • data.mendeley.com
    Updated May 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Mahindru (2020). COVID-19 Android apps data set [Dataset]. http://doi.org/10.17632/k4rt99sfbt.1
    Explore at:
    Dataset updated
    May 4, 2020
    Authors
    Arvind Mahindru
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set consist of Android apps which are developed during the current time when the whole world face the pandemic situation. Cybercriminals are developing malware-infected apps on a regular basis and upload them to different repositories.

  15. Android malware dataset for machine learning 2

    • figshare.com
    txt
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suleiman Yerima (2025). Android malware dataset for machine learning 2 [Dataset]. http://doi.org/10.6084/m9.figshare.5854653.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Suleiman Yerima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset consisting of feature vectors of 215 attributes extracted from 15,036 applications (5,560 malware apps from Drebin project and 9,476 benign apps). The dataset has been used to develop and evaluate multilevel classifier fusion approach for Android malware detection, published in the IEEE Transactions on Cybernetics paper 'DroidFusion: A Novel Multilevel Classifier Fusion Approach for Android Malware Detection'. The supporting file contains further description of the feature vectors/attributes obtained via static code analysis of the Android apps.

  16. m

    PARUDroid

    • data.mendeley.com
    Updated May 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Mahindru (2020). PARUDroid [Dataset]. http://doi.org/10.17632/mg5c8jxbhm.1
    Explore at:
    Dataset updated
    May 16, 2020
    Authors
    Arvind Mahindru
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains Android apps collected from Google play and promised repositories.

  17. Android Malware Detection Dataset

    • kaggle.com
    zip
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danny Revaldo (2024). Android Malware Detection Dataset [Dataset]. https://www.kaggle.com/datasets/dannyrevaldo/android-malware-detection-dataset
    Explore at:
    zip(123470 bytes)Available download formats
    Dataset updated
    Feb 24, 2024
    Authors
    Danny Revaldo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The "Android Malware Detection Dataset" is a comprehensive collection of data designed to facilitate research in the detection and analysis of malware targeting the Android platform. This dataset encompasses a wide range of features extracted from Android applications, providing valuable insights into their behaviors and functionalities.

    Key features of the dataset include:

    • Permission Features: Various permissions requested by Android applications, such as access to location (coarse and fine), camera, microphone, contacts, SMS, calendar, storage, and more.
    • System Features: Features related to system functions and controls, including access to device hardware (e.g., sensors, Bluetooth, NFC), system settings (e.g., changing network state, WiFi settings), and system services (e.g., managing accounts, managing documents).
    • Security-related Features: Features related to security functionalities and behaviors, encompassing permission management, authentication, encryption (e.g., cryptographic operations), and security policy enforcement.
    • Communication Features: Features related to communication functionalities, including sending and receiving SMS messages, making phone calls, accessing network state, and managing network connections.
    • Data Access Features: Features related to accessing and manipulating data, such as reading and writing to various data sources (e.g., external storage, databases), accessing user information (e.g., contacts, call logs), and accessing app-specific data.
    • App Lifecycle Features: Features related to managing the application lifecycle, including app installation and uninstallation, app startup and shutdown, app updates, and app permissions.
    • Device Control Features: Features related to controlling device behavior and settings, such as changing system settings, modifying audio settings, controlling device display, and managing device power.
    • Miscellaneous Features: Other miscellaneous features including accessing system logs, system services and components (e.g., camera, location manager), handling system events (e.g., incoming calls, boot completed), and interacting with system UI components.

    This dataset provides researchers with a rich source of information to develop and evaluate effective malware detection and analysis techniques, ultimately contributing to the enhancement of mobile security on the Android platform.

  18. m

    ITC-Net-Blend-60: A Comprehensive Dataset for Robust Mobile App...

    • data.mendeley.com
    Updated Nov 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marziyeh Bayat (2023). ITC-Net-Blend-60: A Comprehensive Dataset for Robust Mobile App Identification in Real-World Network Environment - Scenario A [Dataset]. http://doi.org/10.17632/ssv23kfcgs.1
    Explore at:
    Dataset updated
    Nov 15, 2023
    Authors
    Marziyeh Bayat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    This dataset includes network traffic data from more than 50 Android applications across 5 different scenarios. The applications are consistent in all scenarios, but other factors like location, device, and user vary (see Table 2 in the paper). The current repository pertains to Scenario A. Within the repository, for each application, there is a compressed file containing the relevant PCAP files. The PCAP files follow the naming convention: {Application Name}{Scenario ID}{#Trace}_Final.pcap.

  19. d

    Install Data APAC - Installed Apps (1st Party Data w/90M records)

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Keyboard, Install Data APAC - Installed Apps (1st Party Data w/90M records) [Dataset]. https://datarade.ai/data-products/1st-party-data-app-usage-installed-apps-app-session-bobble-ai
    Explore at:
    .json, .csv, .xls, .parquetAvailable download formats
    Dataset authored and provided by
    AI Keyboard
    Area covered
    Nepal, Bangladesh, Brazil, Oman, Germany, Pakistan, Netherlands, United Arab Emirates, Philippines, France
    Description

    Install App dataset provides comprehensive, first-party app install intelligence across the APAC region, sourced from AI-driven OS-level keyboard and utility applications. It captures highly granular insights into mobile app installations, updates, and user behavior, enabling precise market analytics, attribution tracking, and growth optimization.

    Each record includes hashed device and advertising identifiers, application metadata (package name, app version, category), and timestamped install/update events. The field is_new_install indicates whether the app installation is first-time or an existing reinstall/update, helping distinguish between new user acquisition and returning user activity — a critical signal for campaign performance and user lifecycle analytics.

    Alongside app-level insights, the dataset provides detailed device intelligence — including manufacturer, model, OS type/version, language, and user agent — combined with IP-based location data (country, region, city) and daily server timestamps for freshness tracking.

    All data is hashed, privacy-compliant, and refreshed daily, making it ideal for organizations seeking high-quality, real-world app install signals across Android and iOS ecosystems.

    📊 Key Features • First-party, consented data from OS-level applications • Hashed identifiers (device_id, advertising_id) for privacy-safe integration • Install and update timestamps for temporal and behavioral analysis • is_new_install flag to separate new installs from reinstalls or app updates • Comprehensive app, device, and location attributes • Daily refreshed dataset ensuring data accuracy and timeliness

    ⚙️ Primary Use Cases • Mobile Attribution & User Acquisition Tracking – Identify new users vs. re-engaged ones via the is_new_install flag • Market Intelligence & Competitive Benchmarking – Analyze install trends across app categories and geographies • Audience Segmentation – Classify users by device type, OS version, and app install behavior • Ad Targeting Optimization – Refine lookalike and re-engagement audiences with verified install data • Product & Growth Analytics – Study retention, uninstall rates, and user churn patterns • App Store Strategy – Evaluate app update frequency and version distribution

    📍 Industries Benefiting • Ad-Tech & Mar-Tech Platforms • Mobile App Publishers & Developers • Telecom Operators & Device OEMs • Market Research & Analytics Firms • E-commerce, Fintech & Gaming Companies • Media, Entertainment & OTT Platforms

    With millions of verified app installs tracked across Android and iOS, this AI-powered, consent-based dataset delivers actionable insights into app discovery, engagement, and retention, driving smarter decisions in mobile marketing, audience intelligence, and growth analytics.

  20. TUApps

    • zenodo.org
    zip
    Updated May 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous Anonymous; Anonymous Anonymous (2024). TUApps [Dataset]. http://doi.org/10.5281/zenodo.11201267
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous Anonymous; Anonymous Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To research the illegal activities of underground apps on Telegram, we have created a dataset called TUApps. TUApps is a progressively growing dataset of underground apps, collected from September 2023 to February 2024, consisting of a total of 1,000 underground apps and 200 million messages distributed across 71,332 Telegram channels.
    In the process of creating this dataset, we followed strict ethical standards to ensure the lawful use of the data and the protection of user privacy. The dataset includes the following files:
    (1) dataset.zip: We have packaged the underground app samples. The naming of Android app files is based on the SHA256 hash of the file, and the naming of iOS app files is based on the SHA256 hash of the publishing webpage.
    (2) code.zip: We have packaged the code used for crawling data from Telegram and for performing data analysis.
    (3) message.zip: We have packaged the messages crawled from Telegram, the files are named after the names of the channels in Telegram.
    Availability of code and messages
    Upon acceptance of our research paper, the dataset containing user messages and the code used for data collection and analysis will only be made available upon request to researchers who agree to adhere to strict ethical principles and maintain the confidentiality of the data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Khushi Yadav (2025). Screen Time and App Usage Dataset (iOS/Android) [Dataset]. https://www.kaggle.com/datasets/khushikyad001/screen-time-and-app-usage-dataset-iosandroid
Organization logo

Screen Time and App Usage Dataset (iOS/Android)

Track app usage trends with focus on productivity vs. entertainment

Explore at:
zip(157038 bytes)Available download formats
Dataset updated
Apr 19, 2025
Authors
Khushi Yadav
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset simulates anonymized mobile screen time and app usage data collected from Android/iOS users over a 3-month period (Jan–April 2024). It captures daily usage trends across various app categories including:

Productivity: Google Docs, Notion, Slack

Entertainment: YouTube, Netflix, TikTok

Social Media: Instagram, WhatsApp, Facebook

Utilities: Chrome, Gmail, Maps

For YouTube, additional engagement statistics such as views, likes, and comments are included to analyze video popularity and content consumption behavior.

The dataset enables exploration of:

Productivity vs. entertainment screen time patterns

Daily usage fluctuations

App-specific user engagement

Correlation between time spent and user interactions

YouTube content virality metrics

This is a great resource for:

EDA projects

Behavioral clustering

Dashboard development

Time series and anomaly detection

Building recommendation or focus-assistive apps

Search
Clear search
Close search
Google apps
Main menu