100+ datasets found
  1. User mobile app interaction data

    • kaggle.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Moslemani (2025). User mobile app interaction data [Dataset]. https://www.kaggle.com/datasets/mohamedmoslemani/user-mobile-app-interaction-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohamed Moslemani
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.

    Key Features Included

    User & Session Metadata

    User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details

    Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale

    Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties

    Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions

    Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:

    Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.

    Usage & Applications

    Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).

    Important Notes & Disclaimer

    Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.

  2. i

    LSApp: Large dataset of Sequential mobile App usage

    • ieee-dataport.org
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cunquan Qu (2025). LSApp: Large dataset of Sequential mobile App usage [Dataset]. https://ieee-dataport.org/documents/lsapp-large-dataset-sequential-mobile-app-usage
    Explore at:
    Dataset updated
    Feb 25, 2025
    Authors
    Cunquan Qu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    During the study period

  3. Dating App Behavior Dataset 2025

    • kaggle.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keyush nisar (2025). Dating App Behavior Dataset 2025 [Dataset]. https://www.kaggle.com/datasets/keyushnisar/dating-app-behavior-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Keyush nisar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides a synthetic representation of user behavior on a fictional dating app. It contains 50,000 records with 19 features capturing demographic details, app usage patterns, swipe tendencies, and match outcomes. The data was generated programmatically to simulate realistic user interactions, making it ideal for exploratory data analysis (EDA), machine learning modeling (e.g., predicting match outcomes), or studying user behavior trends in online dating platforms.

    Key features include gender, sexual orientation, location type, income bracket, education level, user interests, app usage time, swipe ratios, likes received, mutual matches, and match outcomes (e.g., "Mutual Match," "Ghosted," "Catfished"). The dataset is designed to be diverse and balanced, with categorical, numerical, and labeled variables for various analytical purposes.

    Usage

    This dataset can be used for:

    Exploratory Data Analysis (EDA): Investigate correlations between demographics, app usage, and match success. Machine Learning: Build models to predict match outcomes or user engagement levels. Social Studies: Analyze trends in dating app behavior across different demographics. Feature Engineering Practice: Experiment with transforming categorical and numerical data.

  4. d

    Data from: Temporal and Cultural Limits of Privacy in Smartphone App Usage

    • data.dtu.dk
    • figshare.com
    txt
    Updated Jan 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Alessandretti (2021). Temporal and Cultural Limits of Privacy in Smartphone App Usage [Dataset]. http://doi.org/10.11583/DTU.13650797.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 29, 2021
    Dataset provided by
    Technical University of Denmark
    Authors
    Laura Alessandretti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The file anonymized_app_data.csv contains a sample of smartphone app-fingerprints from 20,000 randomly selected individuals, collected in May 2016.Each record in the table corresponds to a (user, app) pair, and reveals that a given app was used at least once by a given user during May 2016. The table contains the following field:user_id : hashed user idapp_id: hashed id the smartphone app The data accompanies the publication: "Temporal and Cultural Limits of Privacy in Smartphone App Usage"

  5. o

    Data from: Google Play Store Dataset

    • opendatabay.com
    .undefined
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2025). Google Play Store Dataset [Dataset]. https://www.opendatabay.com/data/premium/33624898-8133-421d-9b3b-42f76e1e4fe2
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 15, 2025
    Dataset authored and provided by
    Bright Data
    Area covered
    Website Analytics & User Experience
    Description

    Google Play Store dataset to explore detailed information about apps, including ratings, descriptions, updates, and developer details. Popular use cases include app performance analysis, market research, and consumer behavior insights.

    Use our Google Play Store dataset to explore detailed information about apps available on the platform, including app titles, developers, monetization features, user ratings, reviews, and more. This dataset also includes data on app descriptions, safety measures, download counts, recent updates, and compatibility, providing a complete overview of app performance and features.

    Tailored for app developers, marketers, and researchers, this dataset offers valuable insights into user preferences, app trends, and market dynamics. Whether you're optimizing app development, conducting competitive analysis, or tracking app performance, the Google Play Store dataset is an essential resource for making data-driven decisions in the mobile app ecosystem.

    Dataset Features

    • url: The URL link to the app’s detail page on the Google Play Store.
    • title: The name of the application.
    • developer: The developer or company behind the app.
    • monetization_features: Information regarding how the app generates revenue (e.g., in-app purchases, ads).
    • images: Links or references to images associated with the app.
    • about: Details or a summary description of the app.
    • data_safety: Information regarding data safety and privacy practices.
    • rating: The overall rating of the app provided by its users.
    • number_of_reviews: The total count of user reviews received.
    • star_reviews: A breakdown of reviews by star ratings.
    • reviews: Reviews and user feedback about the app.
    • what_new: Information on the latest updates or features added to the app.
    • more_by_this_developer: Other apps by the same developer.
    • content_rating: The content rating which guides suitability based on user age.
    • downloads: The download count or range indicating the app’s popularity.
    • country: The country associated with the app listing.
    • app_category: The category or genre under which the app is classified.

    Distribution

    • Data Volume: 17 Columns and 65.54M Rows
    • Format: CSV

    Usage

    This dataset is ideal for a variety of applications:

    • App Market Analysis: Enables market researchers to extract insights on app popularity, engagement, and trends across different categories.
    • Machine Learning: Can be used by data scientists to build recommendation engines or sentiment analysis models based on app review data.
    • User Behavior Studies: Facilitates academic or industrial research into user preferences and behavior with respect to mobile applications.

    Coverage

    • Geographic Coverage: global.

    License

    CUSTOM Please review the respective licenses below: 1. Data Provider's License - Bright Data Master Service Agreement

    Who Can Use It

    • Data Scientists: To train machine learning models for app popularity prediction, sentiment analysis, or recommendation systems.
    • Researchers: For academic or scientific studies into market trends, consumer behavior, and app performance analysis.
    • Businesses: For strategic analysis, developing market insights, or enhancing app development and user engagement strategies.

    Suggested Dataset Name

    1. Play store Insights
    2. Android App Scope
    3. Market Analytics
    4. Play Store Metrics Vault

    5. AppTrend360: Google Play Edition

    Pricing

    Based on Delivery frequency

    ~Up to $0.0025 per record. Min order $250

    Approximately 10M new records are added each month. Approximately 13.8M records are updated each month. Get the complete dataset each delivery, including all records. Retrieve only the data you need with the flexibility to set Smart Updates.

    • Monthly

    New snapshot each month, 12 snapshots/year Paid monthly

    • Quarterly

    New snapshot each quarter, 4 snapshots/year Paid quarterly

    • Bi-annual

    New snapshot every 6 months, 2 snapshots/year Paid twice-a-year

    • One-time purchase

    New snapshot one-time delivery Paid once

  6. i

    Collect APP usage time data from the “App Usage” app

    • ieee-dataport.org
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhi-Ren Tsai (2025). Collect APP usage time data from the “App Usage” app [Dataset]. https://ieee-dataport.org/documents/collect-app-usage-time-data-app-usage-app
    Explore at:
    Dataset updated
    May 5, 2025
    Authors
    Zhi-Ren Tsai
    Description

    games

  7. b

    App Downloads Data (2025)

    • businessofapps.com
    Updated Sep 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Business of Apps (2017). App Downloads Data (2025) [Dataset]. https://www.businessofapps.com/data/app-statistics/
    Explore at:
    Dataset updated
    Sep 1, 2017
    Dataset authored and provided by
    Business of Apps
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    App Download Key StatisticsApp and Game DownloadsiOS App and Game DownloadsGoogle Play App and Game DownloadsGame DownloadsiOS Game DownloadsGoogle Play Game DownloadsApp DownloadsiOS App...

  8. Data from: AndroCT: Ten Years of App Call Traces in Android

    • zenodo.org
    • explore.openaire.eu
    application/gzip, txt
    Updated Mar 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wen Li; Xiaoqin Fu; Haipeng Cai; Haipeng Cai; Wen Li; Xiaoqin Fu (2022). AndroCT: Ten Years of App Call Traces in Android [Dataset]. http://doi.org/10.5281/zenodo.4470320
    Explore at:
    application/gzip, txtAvailable download formats
    Dataset updated
    Mar 7, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wen Li; Xiaoqin Fu; Haipeng Cai; Haipeng Cai; Wen Li; Xiaoqin Fu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A large-scale dataset on the dynamic profiles based on function calls of 35,974 benign and malicious Android apps from 10 historical years (2010 through 2019). Function calls are a commonly used means to model program behaviors, which may contribute to various code analysis approaches to assuring software correctness, reliability, and security. In particular, our dataset includes dynamic profiles of each app resulting from the same-length of time (10 mins) of being exercised by randomly generated inputs on both emulator and real device, enabling interesting and useful app analysis that reason about app behaviors in an evolutionary perspective while informing the differences of app behaviors on different run-time hardware platforms. Since we have 20 yearly datasets associated with 35,974 unique Android apps across the 10 years, profiling these apps took 12,000 hours. Considering the costs of filtering out apps that were originally sampled but that we were unable to profile (due to various reasons such as broken APKs, not being executable because of incompatibility issues, not instrumentable, etc.), we took over two years to produce all these traces. We hope to save future researchers' time in producing such a set of dynamic data to enable their empirical and technical work.

    ==================

    Thanks for your interest in our dataset. Collecting this dataset took tremendous computational and human effort. Thus, please observe the following restrictions in using our dataset:

    - Do not redistribute this dataset without our consent.
    - Do not make commercial usage of this dataset.
    - Get a faculty, or someone in a permanent position, to agree and commit to these conditions.
    - When publishing your work that uses our dataset, please cite the following MSR 2021 data paper.


    @inproceedings{AndroidCT,
    title = {AndroCT: Ten Years of App Call Traces in Android},
    author = {Wen Li, Xiaoqin Fu, and Haipeng Cai},
    booktitle = {The 18th International Conference on Mining Software Repositories (MSR 2021), Data Showcase Track},
    year = {2021},
    }

  9. H

    Worldwide Mobile App User Behavior Dataset

    • dataverse.harvard.edu
    doc, xlsx
    Updated Sep 28, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2014). Worldwide Mobile App User Behavior Dataset [Dataset]. http://doi.org/10.7910/DVN/27459
    Explore at:
    doc(56320), xlsx(7037534)Available download formats
    Dataset updated
    Sep 28, 2014
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2012
    Area covered
    Worldwide
    Description

    We surveyed 10,208 people from more than 15 countries on their mobile app usage behavior. The countries include USA, China, Japan, Germany, France, Brazil, UK, Italy, Russia, India, Canada, Spain, Australia, Mexico, and South Korea. We asked respondents about: (1) their mobile app user behavior in terms of mobile app usage, including the app stores they use, what triggers them to look for apps, why they download apps, why they abandon apps, and the types of apps they download. (2) their demographics including gender, age, marital status, nationality, country of residence, first language, ethnicity, education level, occupation, and household income (3) their personality using the Big-Five personality traits This dataset contains the results of the survey.

  10. Data used by new top app downloads worldwide 2018-2020

    • statista.com
    Updated Jul 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2021). Data used by new top app downloads worldwide 2018-2020 [Dataset]. https://www.statista.com/statistics/1127265/data-used-top-apps-downloads-worldwide/
    Explore at:
    Dataset updated
    Jul 6, 2021
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the first quarter of 2020, a total of 596 petabytes of data were used by new downloads of the top 250 apps worldwide. In comparison, in the same quarter of the preceding year, only 446 petabytes of data was used by new downloads of the most popular apps. The increase in data used by app downloads is a result of the coronavirus pandemic, during which stay-home recommendations were common worldwide.

  11. Data from: Apple App Store Dataset

    • opendatabay.com
    .other
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2025). Apple App Store Dataset [Dataset]. https://www.opendatabay.com/data/premium/cd5a7748-e9da-4d59-96cd-96a0c95f7994
    Explore at:
    .otherAvailable download formats
    Dataset updated
    Jun 7, 2025
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    Area covered
    Website Analytics & User Experience
    Description

    Apple App Store dataset to explore detailed information on app popularity, user feedback, and monetization features. Popular use cases include market trend analysis, app performance evaluation, and consumer behavior insights in the mobile app ecosystem.

    Use our Apple App Store dataset to gain comprehensive insights into the mobile app ecosystem, including app popularity, user ratings, monetization features, and user feedback. This dataset covers various aspects of apps, such as descriptions, categories, and download metrics, offering a full picture of app performance and trends.

    Tailored for marketers, developers, and industry analysts, this dataset allows you to track market trends, identify emerging apps, and refine promotional strategies. Whether you're optimizing app development, analyzing competitive landscapes, or forecasting market opportunities, the Apple App Store dataset is an essential tool for making data-driven decisions in the ever-evolving mobile app industry.

    Dataset Features

    • url: The URL linking to the app’s page on the Apple App Store.
    • title: The name of the app.
    • sub_title: A brief subtitle or tagline for the app.
    • developer: The name of the entity or individual that developed the app.
    • top_charts: Indicates if the app appears in top charts.
    • monetization_features: Information on monetization aspects (such as in-app purchases or advertisements).
    • image: A reference to the main app image.
    • screenshots: Contains screenshot images of the app.
    • description: Detailed app description outlining main features.
    • what_new: Details on the latest updates or new features.
    • rating: The overall rating based on user reviews.
    • number_of_raters: The total number of users who have rated the app.
    • reviews_by_stars: Breakdown of the number of reviews by star rating.
    • reviews: An aggregation of user reviews.
    • events: Any associated events or promotions.
    • data_linked_to_you: Indicates if any data is linked to the user.
    • seller: The entity responsible for selling or distributing the app.
    • category: The category or genre of the app.
    • languages: Languages supported by the app.
    • copyright: Copyright information provided by the developer.
    • size: The file size of the app.
    • compatibility: Device or OS compatibility details.
    • age_rating: The recommended age rating for the app.
    • price: The price of the app.
    • In_app_purchases: Details on in-app purchase options.
    • support: Information related to app support.
    • more_by_this_developer: Suggestions for other apps by the same developer.
    • you_might_also_like: Recommendations for similar apps.
    • app_support: Additional support details.
    • privacy_policy: Link or reference to the app’s privacy policy.
    • developer_website: The website of the app developer.
    • featured_in: Information on any features or showcases the app has being part of.
    • country: The country from which the app’s data was sourced.
    • timestamp: A timestamp indicating when the data record was last updated.
    • latest_app_version: The most recent version of the app available.
    • app_id: A unique identifier for the app.

    Distribution

    • Data Volume: 36 Columns and 68M Rows
    • Format: CSV

    Usage

    This dataset is versatile and can be used for various applications: - Market Analysis: Analyze app pricing strategies, monetization features, and category distribution to understand market trends and opportunities in the App Store. This can help developers and businesses make informed decisions about their app development and pricing strategies. - User Experience Research: Study the relationship between app ratings, number of reviews, and app features to understand what drives user satisfaction. The detailed review data and ratings can provide insights into user preferences and pain points. - Competitive Intelligence: Track and analyze apps within specific categories, comparing features, pricing, and user engagement metrics to identify successful patterns and market gaps. Particularly useful for developers planning new apps or improving existing ones. - Performance Prediction: Build predictive models using features like app size, category, pricing, and language support to forecast potential app success metrics. This can help in making data-driven decisions during app development. - Localization Strategy: Analyze the languages supported and regional performance to inform decisions about app localization and international market expansion.

    Coverage

    • Geographic Coverage: Global

    License

    CUSTOM Please review the respective licenses below: 1. Data Provider's License - Bright Data Master Service Agreement

    Who Can Use It

    • Data Scientists: Can leverage this dataset for training machine learning algorithms and building predictive models concerning app tr
  12. Google Play Store Apps

    • kaggle.com
    Updated Feb 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lavanya (2019). Google Play Store Apps [Dataset]. https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lavanya
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    [ADVISORY] IMPORTANT

    Instructions for citation:

    If you use this dataset anywhere in your work, kindly cite as the below: L. Gupta, "Google Play Store Apps," Feb 2019. [Online]. Available: https://www.kaggle.com/lava18/google-play-store-apps

    Context

    While many public datasets (on Kaggle and the like) provide Apple App Store data, there are not many counterpart datasets available for Google Play Store apps anywhere on the web. On digging deeper, I found out that iTunes App Store page deploys a nicely indexed appendix-like structure to allow for simple and easy web scraping. On the other hand, Google Play Store uses sophisticated modern-day techniques (like dynamic page load) using JQuery making scraping more challenging.

    Content

    Each app (row) has values for catergory, rating, size, and more.

    Acknowledgements

    This information is scraped from the Google Play Store. This app information would not be available without it.

    Inspiration

    The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers to work on and capture the Android market!

  13. P

    Myket Android Application Install Dataset

    • paperswithcode.com
    Updated Aug 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erfan Loghmani; Mohammadamin Fazli (2023). Myket Android Application Install Dataset [Dataset]. https://paperswithcode.com/dataset/myket-android-application-install
    Explore at:
    Dataset updated
    Aug 12, 2023
    Authors
    Erfan Loghmani; Mohammadamin Fazli
    Description

    This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.

    Data Creation The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.

    We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.

    Data Structure The dataset has two main files.

    myket.csv: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero. app_info_sample.csv: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.

    Dataset Details

    Total Instances: 694,121 install interaction instances Instances Format: Triplets of user_id, app_name, timestamp 10,000 users and 7,988 android applications Item features for 7,606 applications

    For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.

    Top 20 Most Installed Applications | Package Name | Count of Interactions | | ---------------------------------- | --------------------- | | com.instagram.android | 15292 | | ir.resaneh1.iptv | 12143 | | com.tencent.ig | 7919 | | com.ForgeGames.SpecialForcesGroup2 | 7797 | | ir.nomogame.ClutchGame | 6193 | | com.dts.freefireth | 6041 | | com.whatsapp | 5876 | | com.supercell.clashofclans | 5817 | | com.mojang.minecraftpe | 5649 | | com.lenovo.anyshare.gps | 5076 | | ir.medu.shad | 4673 | | com.firsttouchgames.dls3 | 4641 | | com.activision.callofduty.shooter | 4357 | | com.tencent.iglite | 4126 | | com.aparat | 3598 | | com.kiloo.subwaysurf | 3135 | | com.supercell.clashroyale | 2793 | | co.palang.QuizOfKings | 2589 | | com.nazdika.app | 2436 | | com.digikala | 2413 |

    Comparison with SNAP Datasets The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:

    Dataset#Users#Items#InteractionsAverage Interactions per UserAverage Unique Items per User
    Myket10,0007,988694,12169.454.6
    LastFM9801,0001,293,1031,319.5158.2
    Reddit10,000984672,44767.27.9
    Wikipedia8,2271,000157,47419.12.2
    MOOC7,04797411,74958.425.3

    The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.

    Citation If you use this dataset in your research, please cite the following preprint:

    @misc{loghmani2023effect, title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks}, author={Erfan Loghmani and MohammadAmin Fazli}, year={2023}, eprint={2308.06862}, archivePrefix={arXiv}, primaryClass={cs.LG} }

  14. n

    Data for health app usage BCT and UG integration

    • narcis.nl
    • uvaauas.figshare.com
    Updated Feb 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    G.J. de Bruijn (2020). Data for health app usage BCT and UG integration [Dataset]. http://doi.org/10.21942/uva.11798580
    Explore at:
    Dataset updated
    Feb 4, 2020
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    G.J. de Bruijn
    Description

    These data are for the publication on health app usage in Dutch adults (collected May 2018) using BCT and UG concepts

  15. U.S. entertainment app data usage 2017

    • statista.com
    Updated Jan 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). U.S. entertainment app data usage 2017 [Dataset]. https://www.statista.com/statistics/724906/entertainment-app-data-usage/
    Explore at:
    Dataset updated
    Jan 27, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2017
    Area covered
    United States
    Description

    This statistic presents the average monthly data usage of selected entertainment apps in the United States as of June 2017. During the measured period, the average data usage of Netflix amounted to 2,889.5 MB per month.

  16. Dataset: Gold standard dataset for explainability need detection in app...

    • zenodo.org
    zip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Obaidi; Martin Obaidi (2025). Dataset: Gold standard dataset for explainability need detection in app reviews. [Dataset]. http://doi.org/10.5281/zenodo.13273192
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Obaidi; Martin Obaidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We crawled 90,000 app reviews from both Google Play Store and Apple App Store, including reviews from both free and paid apps. These reviews were filtered for explainability needs, and after this process, 4,495 reviews remained. Among them, 2,185 reviews indicated an explanation need, while 2,310 did not. This resulting gold standard dataset was used to train and evaluate several machine learning models and rule-based approaches for detecting explanation needs in app reviews.

    The dataset includes both balanced and unbalanced evaluation sets, as well as the original crawled data from October 2023. In addition to machine learning approaches, rule-based methods optimized for F1 score, precision, and recall are also included.

    We provide several pre-trained machine learning models (including BERT, SetFit, AdaBoost, K-Nearest Neighbor, Logistic Regression, Naive Bayes, Random Forest, and SVM) along with training scripts and evaluation notebooks. These models can be applied directly or retrained using the included datasets.

    For further details on the structure and usage of the dataset, please refer to the README.md file within the provided ZIP archive.

  17. Z

    Coronavirus-themed Mobile Apps (Malware) Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    covid19apps (2021). Coronavirus-themed Mobile Apps (Malware) Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3875975
    Explore at:
    Dataset updated
    Apr 21, 2021
    Dataset authored and provided by
    covid19apps
    Description

    As COVID-19 continues to spread across the world, a growing number of malicious campaigns are exploiting the pandemic. It is reported that COVID-19 is being used in a variety of online malicious activities, including Email scam, ransomware and malicious domains. As the number of the afflicted cases continue to surge, malicious campaigns that use coronavirus as a lure are increasing. Malicious developers take advantage of this opportunity to lure mobile users to download and install malicious apps.

    However, besides a few media reports, the coronavirus-themed mobile malware has not been well studied. Our community lacks of the comprehensive understanding of the landscape of the coronavirus-themed mobile malware, and no accessible dataset could be used by our researchers to boost COVID-19 related cybersecurity studies.

    We make efforts to create a daily growing COVID-19 related mobile app dataset. By the time of mid-November, we have curated a dataset of 4,322 COVID-19 themed apps, and 611 of them are considered to be malicious. The number is growing daily and our dataset will update weekly. For more details, please visit https://covid19apps.github.io

    This dataset includes the following files:

    (1) covid19apps.xlsx

    In this file, we list all the COVID-19 themed apps information, including apk file hashes, released date, package name, AV-Rank, etc.

    (2)covid19apps.zip

    We put the COVID-19 themed apps Apk samples in zip files . In order to reduce the size of a single file, we divide the sample into multiple zip files for storage. And the APK file name after the file SHA256.

    If your papers or articles use our dataset, please use the following bibtex reference to cite our paper: https://arxiv.org/abs/2005.14619

    (Accepted to Empirical Software Engineering)

    @misc{wang2021virus, title={Beyond the Virus: A First Look at Coronavirus-themed Mobile Malware}, author={Liu Wang and Ren He and Haoyu Wang and Pengcheng Xia and Yuanchun Li and Lei Wu and Yajin Zhou and Xiapu Luo and Yulei Sui and Yao Guo and Guoai Xu}, year={2021}, eprint={2005.14619}, archivePrefix={arXiv}, primaryClass={cs.CR} }

  18. Global cellular data traffic used for apps 2025, by category

    • statista.com
    • ai-chatbox.pro
    Updated Feb 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global cellular data traffic used for apps 2025, by category [Dataset]. https://www.statista.com/statistics/383715/global-mobile-data-traffic-share/
    Explore at:
    Dataset updated
    Feb 17, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, video apps accounted for around 76 percent of global mobile data usage every month. Second-ranked social networking accounted for eight percent of global mobile data volume. The two categories, though, can easily overlap, as users can watch videos via video applications, as well as on social networking applications. Most popular social media platforms with video content Facebook, YouTube, and Instagram were among the most popular social networks in the world, as of October 2021. Each of these platforms allow to post, share, and watch video content on a mobile device. One of the fastest growing global brands, Tiktok, is also a social media platform where users can share video content. In September 2021, the platform reached 1 billion monthly active users. Leading types of mobile video content in the U.S. The United States was the third country in the world based on the number of smartphone users as of May 2021, with around 270 million users. Therefore, mobile content usage in the country was one of the highest in the world, and a big part of it was video content. As of the third quarter of 2021, more than 80 percent of survey respondents in the United States reported watching YouTube on their mobile devices. Social media videos were the second most popular type of content for mobile audiences, with almost six in 10 respondents watching videos on social media platforms like TikTok and Twitter.

  19. Average data use of leading navigation apps in the U.S. 2020

    • statista.com
    Updated Nov 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Average data use of leading navigation apps in the U.S. 2020 [Dataset]. https://www.statista.com/statistics/1186009/data-use-leading-us-navigation-apps/
    Explore at:
    Dataset updated
    Nov 30, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2020
    Area covered
    United States
    Description

    As of October 2020, the average amount of mobile data used by Apple Maps per 20 minutes was 1.83 MB, while Google maps used only 0.73 MB. Waze, which is also owned by Google, used the least amount at 0.23 MB per 20 minutes.

  20. Social media usage in University

    • kaggle.com
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IttiphoN (2025). Social media usage in University [Dataset]. https://www.kaggle.com/datasets/ittiphon/social-media-usage
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2025
    Dataset provided by
    Kaggle
    Authors
    IttiphoN
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset consists of daily data collected from users of various social media apps. It includes information such as the number of active users, the frequency of user interactions, and the time spent on each app daily. The data is categorized by individual apps, allowing insights into user behavior and engagement patterns on a per-day basis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohamed Moslemani (2025). User mobile app interaction data [Dataset]. https://www.kaggle.com/datasets/mohamedmoslemani/user-mobile-app-interaction-data
Organization logo

User mobile app interaction data

Generated interaction data of users on the mobile phone with an Application -

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohamed Moslemani
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.

Key Features Included

User & Session Metadata

User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details

Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale

Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties

Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions

Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:

Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.

Usage & Applications

Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).

Important Notes & Disclaimer

Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.

Search
Clear search
Close search
Google apps
Main menu