100+ datasets found

User mobile app interaction data
kaggle.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Moslemani (2025). User mobile app interaction data [Dataset]. https://www.kaggle.com/datasets/mohamedmoslemani/user-mobile-app-interaction-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohamed Moslemani
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.

Key Features Included

User & Session Metadata

User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details

Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale

Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties

Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions

Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:

Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.

Usage & Applications

Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).

Important Notes & Disclaimer

Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.
i
LSApp: Large dataset of Sequential mobile App usage
ieee-dataport.org
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cunquan Qu (2025). LSApp: Large dataset of Sequential mobile App usage [Dataset]. https://ieee-dataport.org/documents/lsapp-large-dataset-sequential-mobile-app-usage
Explore at:
Dataset updated
Feb 25, 2025
Authors
Cunquan Qu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
During the study period
Dating App Behavior Dataset 2025
kaggle.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keyush nisar (2025). Dating App Behavior Dataset 2025 [Dataset]. https://www.kaggle.com/datasets/keyushnisar/dating-app-behavior-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Keyush nisar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a synthetic representation of user behavior on a fictional dating app. It contains 50,000 records with 19 features capturing demographic details, app usage patterns, swipe tendencies, and match outcomes. The data was generated programmatically to simulate realistic user interactions, making it ideal for exploratory data analysis (EDA), machine learning modeling (e.g., predicting match outcomes), or studying user behavior trends in online dating platforms.

Key features include gender, sexual orientation, location type, income bracket, education level, user interests, app usage time, swipe ratios, likes received, mutual matches, and match outcomes (e.g., "Mutual Match," "Ghosted," "Catfished"). The dataset is designed to be diverse and balanced, with categorical, numerical, and labeled variables for various analytical purposes.

Usage

This dataset can be used for:

Exploratory Data Analysis (EDA): Investigate correlations between demographics, app usage, and match success. Machine Learning: Build models to predict match outcomes or user engagement levels. Social Studies: Analyze trends in dating app behavior across different demographics. Feature Engineering Practice: Experiment with transforming categorical and numerical data.
d
Data from: Temporal and Cultural Limits of Privacy in Smartphone App Usage
data.dtu.dk
figshare.com
txt
Updated Jan 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Alessandretti (2021). Temporal and Cultural Limits of Privacy in Smartphone App Usage [Dataset]. http://doi.org/10.11583/DTU.13650797.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.13650797.v1
Dataset updated
Jan 29, 2021
Dataset provided by
Technical University of Denmark
Authors
Laura Alessandretti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The file anonymized_app_data.csv contains a sample of smartphone app-fingerprints from 20,000 randomly selected individuals, collected in May 2016.Each record in the table corresponds to a (user, app) pair, and reveals that a given app was used at least once by a given user during May 2016. The table contains the following field:user_id : hashed user idapp_id: hashed id the smartphone app The data accompanies the publication: "Temporal and Cultural Limits of Privacy in Smartphone App Usage"
o
Data from: Google Play Store Dataset
opendatabay.com
.undefined
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2025). Google Play Store Dataset [Dataset]. https://www.opendatabay.com/data/premium/33624898-8133-421d-9b3b-42f76e1e4fe2
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 15, 2025
Dataset authored and provided by
Bright Data
Area covered
Website Analytics & User Experience
Description
Google Play Store dataset to explore detailed information about apps, including ratings, descriptions, updates, and developer details. Popular use cases include app performance analysis, market research, and consumer behavior insights.

Use our Google Play Store dataset to explore detailed information about apps available on the platform, including app titles, developers, monetization features, user ratings, reviews, and more. This dataset also includes data on app descriptions, safety measures, download counts, recent updates, and compatibility, providing a complete overview of app performance and features.

Tailored for app developers, marketers, and researchers, this dataset offers valuable insights into user preferences, app trends, and market dynamics. Whether you're optimizing app development, conducting competitive analysis, or tracking app performance, the Google Play Store dataset is an essential resource for making data-driven decisions in the mobile app ecosystem.

Dataset Features

url: The URL link to the app’s detail page on the Google Play Store.

title: The name of the application.

developer: The developer or company behind the app.

monetization_features: Information regarding how the app generates revenue (e.g., in-app purchases, ads).

images: Links or references to images associated with the app.

about: Details or a summary description of the app.

data_safety: Information regarding data safety and privacy practices.

rating: The overall rating of the app provided by its users.

number_of_reviews: The total count of user reviews received.

star_reviews: A breakdown of reviews by star ratings.

reviews: Reviews and user feedback about the app.

what_new: Information on the latest updates or features added to the app.

more_by_this_developer: Other apps by the same developer.

content_rating: The content rating which guides suitability based on user age.

downloads: The download count or range indicating the app’s popularity.

country: The country associated with the app listing.

app_category: The category or genre under which the app is classified.

Distribution

Data Volume: 17 Columns and 65.54M Rows

Format: CSV

Usage

This dataset is ideal for a variety of applications:

App Market Analysis: Enables market researchers to extract insights on app popularity, engagement, and trends across different categories.

Machine Learning: Can be used by data scientists to build recommendation engines or sentiment analysis models based on app review data.

User Behavior Studies: Facilitates academic or industrial research into user preferences and behavior with respect to mobile applications.

Coverage

Geographic Coverage: global.

License

CUSTOM Please review the respective licenses below: 1. Data Provider's License - Bright Data Master Service Agreement

Who Can Use It

Data Scientists: To train machine learning models for app popularity prediction, sentiment analysis, or recommendation systems.

Researchers: For academic or scientific studies into market trends, consumer behavior, and app performance analysis.

Businesses: For strategic analysis, developing market insights, or enhancing app development and user engagement strategies.

Suggested Dataset Name

Play store Insights

Android App Scope

Market Analytics

Play Store Metrics Vault

5. AppTrend360: Google Play Edition

Pricing

Based on Delivery frequency

~Up to $0.0025 per record. Min order $250

Approximately 10M new records are added each month. Approximately 13.8M records are updated each month. Get the complete dataset each delivery, including all records. Retrieve only the data you need with the flexibility to set Smart Updates.

Monthly

New snapshot each month, 12 snapshots/year Paid monthly

Quarterly

New snapshot each quarter, 4 snapshots/year Paid quarterly

Bi-annual

New snapshot every 6 months, 2 snapshots/year Paid twice-a-year

One-time purchase

New snapshot one-time delivery Paid once
i
Collect APP usage time data from the “App Usage” app
ieee-dataport.org
Updated May 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhi-Ren Tsai (2025). Collect APP usage time data from the “App Usage” app [Dataset]. https://ieee-dataport.org/documents/collect-app-usage-time-data-app-usage-app
Explore at:
Dataset updated
May 5, 2025
Authors
Zhi-Ren Tsai
Description
games
b
App Downloads Data (2025)
businessofapps.com
Updated Sep 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Business of Apps (2017). App Downloads Data (2025) [Dataset]. https://www.businessofapps.com/data/app-statistics/
Explore at:
Dataset updated
Sep 1, 2017
Dataset authored and provided by
Business of Apps
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
App Download Key StatisticsApp and Game DownloadsiOS App and Game DownloadsGoogle Play App and Game DownloadsGame DownloadsiOS Game DownloadsGoogle Play Game DownloadsApp DownloadsiOS App...
Data from: AndroCT: Ten Years of App Call Traces in Android
zenodo.org
explore.openaire.eu
application/gzip, txt
Updated Mar 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wen Li; Xiaoqin Fu; Haipeng Cai; Haipeng Cai; Wen Li; Xiaoqin Fu (2022). AndroCT: Ten Years of App Call Traces in Android [Dataset]. http://doi.org/10.5281/zenodo.4470320
Explore at:
application/gzip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4470320
Dataset updated
Mar 7, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wen Li; Xiaoqin Fu; Haipeng Cai; Haipeng Cai; Wen Li; Xiaoqin Fu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A large-scale dataset on the dynamic profiles based on function calls of 35,974 benign and malicious Android apps from 10 historical years (2010 through 2019). Function calls are a commonly used means to model program behaviors, which may contribute to various code analysis approaches to assuring software correctness, reliability, and security. In particular, our dataset includes dynamic profiles of each app resulting from the same-length of time (10 mins) of being exercised by randomly generated inputs on both emulator and real device, enabling interesting and useful app analysis that reason about app behaviors in an evolutionary perspective while informing the differences of app behaviors on different run-time hardware platforms. Since we have 20 yearly datasets associated with 35,974 unique Android apps across the 10 years, profiling these apps took 12,000 hours. Considering the costs of filtering out apps that were originally sampled but that we were unable to profile (due to various reasons such as broken APKs, not being executable because of incompatibility issues, not instrumentable, etc.), we took over two years to produce all these traces. We hope to save future researchers' time in producing such a set of dynamic data to enable their empirical and technical work.

==================

Thanks for your interest in our dataset. Collecting this dataset took tremendous computational and human effort. Thus, please observe the following restrictions in using our dataset:

- Do not redistribute this dataset without our consent.
- Do not make commercial usage of this dataset.
- Get a faculty, or someone in a permanent position, to agree and commit to these conditions.
- When publishing your work that uses our dataset, please cite the following MSR 2021 data paper.

@inproceedings{AndroidCT,
title = {AndroCT: Ten Years of App Call Traces in Android},
author = {Wen Li, Xiaoqin Fu, and Haipeng Cai},
booktitle = {The 18th International Conference on Mining Software Repositories (MSR 2021), Data Showcase Track},
year = {2021},
}
H
Worldwide Mobile App User Behavior Dataset
dataverse.harvard.edu
doc, xlsx
Updated Sep 28, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2014). Worldwide Mobile App User Behavior Dataset [Dataset]. http://doi.org/10.7910/DVN/27459
Explore at:
doc(56320), xlsx(7037534)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/27459
Dataset updated
Sep 28, 2014
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2012
Area covered
Worldwide
Description
We surveyed 10,208 people from more than 15 countries on their mobile app usage behavior. The countries include USA, China, Japan, Germany, France, Brazil, UK, Italy, Russia, India, Canada, Spain, Australia, Mexico, and South Korea. We asked respondents about: (1) their mobile app user behavior in terms of mobile app usage, including the app stores they use, what triggers them to look for apps, why they download apps, why they abandon apps, and the types of apps they download. (2) their demographics including gender, age, marital status, nationality, country of residence, first language, ethnicity, education level, occupation, and household income (3) their personality using the Big-Five personality traits This dataset contains the results of the survey.
Data used by new top app downloads worldwide 2018-2020
statista.com
Updated Jul 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2021). Data used by new top app downloads worldwide 2018-2020 [Dataset]. https://www.statista.com/statistics/1127265/data-used-top-apps-downloads-worldwide/
Explore at:
Dataset updated
Jul 6, 2021
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In the first quarter of 2020, a total of 596 petabytes of data were used by new downloads of the top 250 apps worldwide. In comparison, in the same quarter of the preceding year, only 446 petabytes of data was used by new downloads of the most popular apps. The increase in data used by app downloads is a result of the coronavirus pandemic, during which stay-home recommendations were common worldwide.
Data from: Apple App Store Dataset
opendatabay.com
.other
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2025). Apple App Store Dataset [Dataset]. https://www.opendatabay.com/data/premium/cd5a7748-e9da-4d59-96cd-96a0c95f7994
Explore at:
.otherAvailable download formats
Dataset updated
Jun 7, 2025
Dataset authored and provided by
Bright Datahttps://brightdata.com/
Area covered
Website Analytics & User Experience
Description
Apple App Store dataset to explore detailed information on app popularity, user feedback, and monetization features. Popular use cases include market trend analysis, app performance evaluation, and consumer behavior insights in the mobile app ecosystem.

Use our Apple App Store dataset to gain comprehensive insights into the mobile app ecosystem, including app popularity, user ratings, monetization features, and user feedback. This dataset covers various aspects of apps, such as descriptions, categories, and download metrics, offering a full picture of app performance and trends.

Tailored for marketers, developers, and industry analysts, this dataset allows you to track market trends, identify emerging apps, and refine promotional strategies. Whether you're optimizing app development, analyzing competitive landscapes, or forecasting market opportunities, the Apple App Store dataset is an essential tool for making data-driven decisions in the ever-evolving mobile app industry.

Dataset Features

url: The URL linking to the app’s page on the Apple App Store.

title: The name of the app.

sub_title: A brief subtitle or tagline for the app.

developer: The name of the entity or individual that developed the app.

top_charts: Indicates if the app appears in top charts.

monetization_features: Information on monetization aspects (such as in-app purchases or advertisements).

image: A reference to the main app image.

screenshots: Contains screenshot images of the app.

description: Detailed app description outlining main features.

what_new: Details on the latest updates or new features.

rating: The overall rating based on user reviews.

number_of_raters: The total number of users who have rated the app.

reviews_by_stars: Breakdown of the number of reviews by star rating.

reviews: An aggregation of user reviews.

events: Any associated events or promotions.

data_linked_to_you: Indicates if any data is linked to the user.

seller: The entity responsible for selling or distributing the app.

category: The category or genre of the app.

languages: Languages supported by the app.

copyright: Copyright information provided by the developer.

size: The file size of the app.

compatibility: Device or OS compatibility details.

age_rating: The recommended age rating for the app.

price: The price of the app.

In_app_purchases: Details on in-app purchase options.

support: Information related to app support.

more_by_this_developer: Suggestions for other apps by the same developer.

you_might_also_like: Recommendations for similar apps.

app_support: Additional support details.

privacy_policy: Link or reference to the app’s privacy policy.

developer_website: The website of the app developer.

featured_in: Information on any features or showcases the app has being part of.

country: The country from which the app’s data was sourced.

timestamp: A timestamp indicating when the data record was last updated.

latest_app_version: The most recent version of the app available.

app_id: A unique identifier for the app.

Distribution

Data Volume: 36 Columns and 68M Rows

Format: CSV

Usage

This dataset is versatile and can be used for various applications: - Market Analysis: Analyze app pricing strategies, monetization features, and category distribution to understand market trends and opportunities in the App Store. This can help developers and businesses make informed decisions about their app development and pricing strategies. - User Experience Research: Study the relationship between app ratings, number of reviews, and app features to understand what drives user satisfaction. The detailed review data and ratings can provide insights into user preferences and pain points. - Competitive Intelligence: Track and analyze apps within specific categories, comparing features, pricing, and user engagement metrics to identify successful patterns and market gaps. Particularly useful for developers planning new apps or improving existing ones. - Performance Prediction: Build predictive models using features like app size, category, pricing, and language support to forecast potential app success metrics. This can help in making data-driven decisions during app development. - Localization Strategy: Analyze the languages supported and regional performance to inform decisions about app localization and international market expansion.

Coverage

Geographic Coverage: Global

License

CUSTOM Please review the respective licenses below: 1. Data Provider's License - Bright Data Master Service Agreement

Who Can Use It

Data Scientists: Can leverage this dataset for training machine learning algorithms and building predictive models concerning app tr
Google Play Store Apps
kaggle.com
Updated Feb 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lavanya (2019). Google Play Store Apps [Dataset]. https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lavanya
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
[ADVISORY] IMPORTANT

Instructions for citation:

If you use this dataset anywhere in your work, kindly cite as the below: L. Gupta, "Google Play Store Apps," Feb 2019. [Online]. Available: https://www.kaggle.com/lava18/google-play-store-apps

Context

While many public datasets (on Kaggle and the like) provide Apple App Store data, there are not many counterpart datasets available for Google Play Store apps anywhere on the web. On digging deeper, I found out that iTunes App Store page deploys a nicely indexed appendix-like structure to allow for simple and easy web scraping. On the other hand, Google Play Store uses sophisticated modern-day techniques (like dynamic page load) using JQuery making scraping more challenging.

Content

Each app (row) has values for catergory, rating, size, and more.

Acknowledgements

This information is scraped from the Google Play Store. This app information would not be available without it.

Inspiration

The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers to work on and capture the Android market!

Myket Android Application Install Dataset

paperswithcode.com

Updated Aug 12, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Erfan Loghmani; Mohammadamin Fazli (2023). Myket Android Application Install Dataset [Dataset]. https://paperswithcode.com/dataset/myket-android-application-install

Explore at:

Dataset updated

Aug 12, 2023

Authors

Erfan Loghmani; Mohammadamin Fazli

Description

This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.

Data Creation The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.

We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.

Data Structure The dataset has two main files.

myket.csv: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero. app_info_sample.csv: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.

Dataset Details

Total Instances: 694,121 install interaction instances Instances Format: Triplets of user_id, app_name, timestamp 10,000 users and 7,988 android applications Item features for 7,606 applications

For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.

Top 20 Most Installed Applications | Package Name | Count of Interactions | | ---------------------------------- | --------------------- | | com.instagram.android | 15292 | | ir.resaneh1.iptv | 12143 | | com.tencent.ig | 7919 | | com.ForgeGames.SpecialForcesGroup2 | 7797 | | ir.nomogame.ClutchGame | 6193 | | com.dts.freefireth | 6041 | | com.whatsapp | 5876 | | com.supercell.clashofclans | 5817 | | com.mojang.minecraftpe | 5649 | | com.lenovo.anyshare.gps | 5076 | | ir.medu.shad | 4673 | | com.firsttouchgames.dls3 | 4641 | | com.activision.callofduty.shooter | 4357 | | com.tencent.iglite | 4126 | | com.aparat | 3598 | | com.kiloo.subwaysurf | 3135 | | com.supercell.clashroyale | 2793 | | co.palang.QuizOfKings | 2589 | | com.nazdika.app | 2436 | | com.digikala | 2413 |

Comparison with SNAP Datasets The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:

Dataset	#Users	#Items	#Interactions	Average Interactions per User	Average Unique Items per User
Myket	10,000	7,988	694,121	69.4	54.6
LastFM	980	1,000	1,293,103	1,319.5	158.2
Reddit	10,000	984	672,447	67.2	7.9
Wikipedia	8,227	1,000	157,474	19.1	2.2
MOOC	7,047	97	411,749	58.4	25.3

The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.

Citation If you use this dataset in your research, please cite the following preprint:

@misc{loghmani2023effect, title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks}, author={Erfan Loghmani and MohammadAmin Fazli}, year={2023}, eprint={2308.06862}, archivePrefix={arXiv}, primaryClass={cs.LG} }

n
Data for health app usage BCT and UG integration
narcis.nl
uvaauas.figshare.com
Updated Feb 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
G.J. de Bruijn (2020). Data for health app usage BCT and UG integration [Dataset]. http://doi.org/10.21942/uva.11798580
Explore at:
Unique identifier
https://doi.org/10.21942/uva.11798580
Dataset updated
Feb 4, 2020
Dataset provided by
University of Amsterdam / Amsterdam University of Applied Sciences
Authors
G.J. de Bruijn
Description
These data are for the publication on health app usage in Dutch adults (collected May 2018) using BCT and UG concepts
U.S. entertainment app data usage 2017
statista.com
Updated Jan 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). U.S. entertainment app data usage 2017 [Dataset]. https://www.statista.com/statistics/724906/entertainment-app-data-usage/
Explore at:
Dataset updated
Jan 27, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2017
Area covered
United States
Description
This statistic presents the average monthly data usage of selected entertainment apps in the United States as of June 2017. During the measured period, the average data usage of Netflix amounted to 2,889.5 MB per month.
Dataset: Gold standard dataset for explainability need detection in app...
zenodo.org
zip
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Obaidi; Martin Obaidi (2025). Dataset: Gold standard dataset for explainability need detection in app reviews. [Dataset]. http://doi.org/10.5281/zenodo.13273192
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13273192
Dataset updated
May 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Obaidi; Martin Obaidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We crawled 90,000 app reviews from both Google Play Store and Apple App Store, including reviews from both free and paid apps. These reviews were filtered for explainability needs, and after this process, 4,495 reviews remained. Among them, 2,185 reviews indicated an explanation need, while 2,310 did not. This resulting gold standard dataset was used to train and evaluate several machine learning models and rule-based approaches for detecting explanation needs in app reviews.

The dataset includes both balanced and unbalanced evaluation sets, as well as the original crawled data from October 2023. In addition to machine learning approaches, rule-based methods optimized for F1 score, precision, and recall are also included.

We provide several pre-trained machine learning models (including BERT, SetFit, AdaBoost, K-Nearest Neighbor, Logistic Regression, Naive Bayes, Random Forest, and SVM) along with training scripts and evaluation notebooks. These models can be applied directly or retrained using the included datasets.

For further details on the structure and usage of the dataset, please refer to the README.md file within the provided ZIP archive.
Z
Coronavirus-themed Mobile Apps (Malware) Dataset
data.niaid.nih.gov
zenodo.org
Updated Apr 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
covid19apps (2021). Coronavirus-themed Mobile Apps (Malware) Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3875975
Explore at:
Dataset updated
Apr 21, 2021
Dataset authored and provided by
covid19apps
Description
As COVID-19 continues to spread across the world, a growing number of malicious campaigns are exploiting the pandemic. It is reported that COVID-19 is being used in a variety of online malicious activities, including Email scam, ransomware and malicious domains. As the number of the afflicted cases continue to surge, malicious campaigns that use coronavirus as a lure are increasing. Malicious developers take advantage of this opportunity to lure mobile users to download and install malicious apps.

However, besides a few media reports, the coronavirus-themed mobile malware has not been well studied. Our community lacks of the comprehensive understanding of the landscape of the coronavirus-themed mobile malware, and no accessible dataset could be used by our researchers to boost COVID-19 related cybersecurity studies.

We make efforts to create a daily growing COVID-19 related mobile app dataset. By the time of mid-November, we have curated a dataset of 4,322 COVID-19 themed apps, and 611 of them are considered to be malicious. The number is growing daily and our dataset will update weekly. For more details, please visit https://covid19apps.github.io

This dataset includes the following files:

(1) covid19apps.xlsx

In this file, we list all the COVID-19 themed apps information, including apk file hashes, released date, package name, AV-Rank, etc.

(2)covid19apps.zip

We put the COVID-19 themed apps Apk samples in zip files . In order to reduce the size of a single file, we divide the sample into multiple zip files for storage. And the APK file name after the file SHA256.

If your papers or articles use our dataset, please use the following bibtex reference to cite our paper: https://arxiv.org/abs/2005.14619

(Accepted to Empirical Software Engineering)

@misc{wang2021virus, title={Beyond the Virus: A First Look at Coronavirus-themed Mobile Malware}, author={Liu Wang and Ren He and Haoyu Wang and Pengcheng Xia and Yuanchun Li and Lei Wu and Yajin Zhou and Xiapu Luo and Yulei Sui and Yao Guo and Guoai Xu}, year={2021}, eprint={2005.14619}, archivePrefix={arXiv}, primaryClass={cs.CR} }
Global cellular data traffic used for apps 2025, by category
statista.com
ai-chatbox.pro
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global cellular data traffic used for apps 2025, by category [Dataset]. https://www.statista.com/statistics/383715/global-mobile-data-traffic-share/
Explore at:
Dataset updated
Feb 17, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
As of February 2025, video apps accounted for around 76 percent of global mobile data usage every month. Second-ranked social networking accounted for eight percent of global mobile data volume. The two categories, though, can easily overlap, as users can watch videos via video applications, as well as on social networking applications. Most popular social media platforms with video content Facebook, YouTube, and Instagram were among the most popular social networks in the world, as of October 2021. Each of these platforms allow to post, share, and watch video content on a mobile device. One of the fastest growing global brands, Tiktok, is also a social media platform where users can share video content. In September 2021, the platform reached 1 billion monthly active users. Leading types of mobile video content in the U.S. The United States was the third country in the world based on the number of smartphone users as of May 2021, with around 270 million users. Therefore, mobile content usage in the country was one of the highest in the world, and a big part of it was video content. As of the third quarter of 2021, more than 80 percent of survey respondents in the United States reported watching YouTube on their mobile devices. Social media videos were the second most popular type of content for mobile audiences, with almost six in 10 respondents watching videos on social media platforms like TikTok and Twitter.
Average data use of leading navigation apps in the U.S. 2020
statista.com
Updated Nov 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Average data use of leading navigation apps in the U.S. 2020 [Dataset]. https://www.statista.com/statistics/1186009/data-use-leading-us-navigation-apps/
Explore at:
Dataset updated
Nov 30, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2020
Area covered
United States
Description
As of October 2020, the average amount of mobile data used by Apple Maps per 20 minutes was 1.83 MB, while Google maps used only 0.73 MB. Waze, which is also owned by Google, used the least amount at 0.23 MB per 20 minutes.
Social media usage in University
kaggle.com
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IttiphoN (2025). Social media usage in University [Dataset]. https://www.kaggle.com/datasets/ittiphon/social-media-usage
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2025
Dataset provided by
Kaggle
Authors
IttiphoN
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset consists of daily data collected from users of various social media apps. It includes information such as the number of active users, the frequency of user interactions, and the time spent on each app daily. The data is categorized by individual apps, allowing insights into user behavior and engagement patterns on a per-day basis.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohamed Moslemani (2025). User mobile app interaction data [Dataset]. https://www.kaggle.com/datasets/mohamedmoslemani/user-mobile-app-interaction-data

User mobile app interaction data

Generated interaction data of users on the mobile phone with an Application -

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 15, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Mohamed Moslemani

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.

Key Features Included

User & Session Metadata

User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details

Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale

Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties

Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions

Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:

Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.

Usage & Applications

Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).

Important Notes & Disclaimer

Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.

Clear search

Close search

Google apps

Main menu

User mobile app interaction data

Key Features Included

Usage & Applications

Important Notes & Disclaimer

LSApp: Large dataset of Sequential mobile App usage

Dating App Behavior Dataset 2025

Usage

Data from: Temporal and Cultural Limits of Privacy in Smartphone App Usage

Data from: Google Play Store Dataset

Dataset Features

Distribution

Usage

Coverage

License

Who Can Use It

Suggested Dataset Name

5. AppTrend360: Google Play Edition

Pricing

Based on Delivery frequency

Collect APP usage time data from the “App Usage” app

App Downloads Data (2025)

Data from: AndroCT: Ten Years of App Call Traces in Android

Worldwide Mobile App User Behavior Dataset

Data used by new top app downloads worldwide 2018-2020

Data from: Apple App Store Dataset

Dataset Features

Distribution

Usage

Coverage

License

Who Can Use It

Google Play Store Apps

[ADVISORY] IMPORTANT

Instructions for citation:

Context

Content

Acknowledgements

Inspiration

Myket Android Application Install Dataset

Data for health app usage BCT and UG integration

U.S. entertainment app data usage 2017

Dataset: Gold standard dataset for explainability need detection in app...

Coronavirus-themed Mobile Apps (Malware) Dataset

Global cellular data traffic used for apps 2025, by category

Average data use of leading navigation apps in the U.S. 2020

Social media usage in University

User mobile app interaction data

Generated interaction data of users on the mobile phone with an Application -

Key Features Included

Usage & Applications

Important Notes & Disclaimer