48 datasets found

Google Play Store Apps Dataset

kaggle.com

zip

Updated Oct 30, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Yusuf Delikkaya (2024). Google Play Store Apps Dataset [Dataset]. https://www.kaggle.com/datasets/yusufdelikkaya/google-play-store-apps-dataset/code

Explore at:

zip(319016 bytes)Available download formats

Dataset updated

Oct 30, 2024

Authors

Yusuf Delikkaya

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Description:

The dataset comprises anonymized data on apps available on the Google Play Store, capturing various aspects such as ratings, downloads, and categorization.
The dataset has 10,841 entries, with some columns containing missing values, particularly in "Rating," "Type," "Content Rating," "Current Ver," and "Android Ver".
This dataset can be utilized for analyzing trends in mobile app usage, user preferences, and app performance metrics across different categories.
It can aid in understanding the impact of factors like app size, rating, and category on user downloads and popularity.
This dataset can be utilized for analyzing app popularity, user preferences, and the relationship between app features (e.g., size, price) and downloads.
It can help in identifying trends in app categories, assessing user satisfaction through ratings and reviews, and providing insights for app developers and marketers on user engagement and app performance.

Features:

Column Name	Description
App	The name of the app as listed on the Google Play Store.
Category	The category to which the app belongs (e.g., ART_AND_DESIGN, GAME).
Rating	The user rating of the app on a scale from 1 to 5.
Reviews	The number of user reviews for the app.
Size	The size of the app in megabytes (MB) or kilobytes (KB).
Installs	The number of installs/downloads of the app (e.g., 10,000+).
Type	Indicates whether the app is free or paid.
Price	The price of the app in USD, if it is a paid app.
Content Rating	The target audience for the app (e.g., Everyone, Teen, Mature 17+).
Genres	The genres associated with the app (e.g., Art & Design, Creativity).
Last Updated	The date when the app was last updated.
Current Ver	The current version of the app.
Android Ver	The minimum Android version required to run the app.

Play Store Apps

kaggle.com

Updated Sep 16, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Aman Chauhan (2022). Play Store Apps [Dataset]. https://www.kaggle.com/datasets/whenamancodes/play-store-apps

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 16, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Aman Chauhan

License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

While many public datasets (on Kaggle and the like) provide Apple App Store data, there are not many counterpart datasets available for Google Play Store apps anywhere on the web. On digging deeper, I found out that iTunes App Store page deploys a nicely indexed appendix-like structure to allow for simple and easy web scraping. On the other hand, Google Play Store uses sophisticated modern-day techniques (like dynamic page load) using JQuery making scraping more challenging.

Each app (row) has values for catergory, rating, size, and more.

The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers to work on and capture the Android market!

googleplaystore.csv

Columns	Description
App	Application name
Category	Category the app belongs to
Ratings	Overall user rating of the app (as when scraped)
Reviews	Number of user reviews for the app (as when scraped)
Size	Size of the app (as when scraped)
Installs	Number of user downloads/installs for the app (as when scraped)
Type	Paid or Free
Price	Price of the app (as when scraped)
Content Rating	Age group the app is targeted at - Children / Mature 21+ / Adult
Genre	An app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to
Current Ver	Current version of the app available on Play Store (as when scraped)
Android Ver	Min required Android version (as when scraped)

googleplaystore_user_reviews.csv

Columns	Description
App	Name of app
Translated Reviews	User review (Preprocessed and translated to English)
Sentiment	Positive/Negative/Neutral (Preprocessed)
Sentiment_polarity	Sentiment polarity score
Sentiment_subjectivity	Sentiment subjectivity score

More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Haha

Screen Time and App Usage Dataset (iOS/Android)
kaggle.com
zip
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khushi Yadav (2025). Screen Time and App Usage Dataset (iOS/Android) [Dataset]. https://www.kaggle.com/datasets/khushikyad001/screen-time-and-app-usage-dataset-iosandroid
Explore at:
zip(157038 bytes)Available download formats
Dataset updated
Apr 19, 2025
Authors
Khushi Yadav
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset simulates anonymized mobile screen time and app usage data collected from Android/iOS users over a 3-month period (Jan–April 2024). It captures daily usage trends across various app categories including:

Productivity: Google Docs, Notion, Slack

Entertainment: YouTube, Netflix, TikTok

Social Media: Instagram, WhatsApp, Facebook

Utilities: Chrome, Gmail, Maps

For YouTube, additional engagement statistics such as views, likes, and comments are included to analyze video popularity and content consumption behavior.

The dataset enables exploration of:

Productivity vs. entertainment screen time patterns

Daily usage fluctuations

App-specific user engagement

Correlation between time spent and user interactions

YouTube content virality metrics

This is a great resource for:

EDA projects

Behavioral clustering

Dashboard development

Time series and anomaly detection

Building recommendation or focus-assistive apps
Data from: Hall-of-Apps: The Top Android Apps Metadata Archive
zenodo.org
data-staging.niaid.nih.gov
+1more
bz2, zip
Updated Mar 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Bello-Jiménez; Laura Bello-Jiménez; Camilo Escobar-Velásquez; Camilo Escobar-Velásquez; Anamaria Mojica-Hanke; Anamaria Mojica-Hanke; Santiago Cortés-Fernandéz; Santiago Cortés-Fernandéz; Mario Linares-Vásquez; Mario Linares-Vásquez (2020). Hall-of-Apps: The Top Android Apps Metadata Archive [Dataset]. http://doi.org/10.5281/zenodo.3716367
Explore at:
zip, bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3716367
Dataset updated
Mar 20, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Laura Bello-Jiménez; Laura Bello-Jiménez; Camilo Escobar-Velásquez; Camilo Escobar-Velásquez; Anamaria Mojica-Hanke; Anamaria Mojica-Hanke; Santiago Cortés-Fernandéz; Santiago Cortés-Fernandéz; Mario Linares-Vásquez; Mario Linares-Vásquez
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The amount of Android apps available for download is constantly increasing, exerting a continuous pressure on developers to publish outstanding apps. Google Play (GP) is the default distribution channel for Android apps, which provides mobile app users with metrics to identify and report apps quality such as rating, amount of downloads, previous users comments, etc. In addition to those metrics, GP presents a set of top charts that highlight the outstanding apps in different categories. Both metrics and top app charts help developers to identify whether their development decisions are well valued by the community. Therefore, app presence in these top charts is a valuable information when understanding the features of top-apps. In this paper we present Hall-of-Apps, a dataset containing top charts' apps metadata extracted (weekly) from GP, for 4 different countries, during 30 weeks. The data is presented as (i) raw HTML files, (ii) a MongoDB database with all the information contained in app's HTML files (e.g., app description, category, general rating, etc.), and (iii) data visualizations built with the D3.js framework. A first characterization of the data along with the urls to retrieve it can be found in our online appendix: https://thesoftwaredesignlab.github.io/hall-of-apps-tools/
Z
Dataset used for "A Recommender System of Buggy App Checkers for App Store...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jun 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Gomez; Romain Rouvoy; Martin Monperrus; Lionel Seinturier (2021). Dataset used for "A Recommender System of Buggy App Checkers for App Store Moderators" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5034291
Explore at:
Dataset updated
Jun 28, 2021
Dataset provided by
University of Lille / Inria
Authors
Maria Gomez; Romain Rouvoy; Martin Monperrus; Lionel Seinturier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset used for paper: "A Recommender System of Buggy App Checkers for App Store Moderators", published on the International Conference on Mobile Software Engineering and Systems (MOBILESoft) in 2015.

Dataset Collection We built a dataset that consists of a random sample of Android app metadata and user reviews available on the Google Play Store on January and March 2014. Since the Google Play Store is continuously evolving (adding, removing and/or updating apps), we updated the dataset twice. The dataset D1 contains available apps in the Google Play Store in January 2014. Then, we created a new snapshot (D2) of the Google Play Store in March 2014.

The apps belong to the 27 different categories defined by Google (at the time of writing the paper), and the 4 predefined subcategories (free, paid, new_free, and new_paid). For each category-subcategory pair (e.g. tools-free, tools-paid, sports-new_free, etc.), we collected a maximum of 500 samples, resulting in a median number of 1.978 apps per category.

For each app, we retrieved the following metadata: name, package, creator, version code, version name, number of downloads, size, upload date, star rating, star counting, and the set of permission requests.

In addition, for each app, we collected up to a maximum of the latest 500 reviews posted by users in the Google Play Store. For each review, we retrieved its metadata: title, description, device, and version of the app. None of these fields were mandatory, thus several reviews lack some of these details. From all the reviews attached to an app, we only considered the reviews associated with the latest version of the app —i.e., we discarded unversioned and old-versioned reviews. Thus, resulting in a corpus of 1,402,717 reviews (2014 Jan.).

Dataset Stats Some stats about the datasets:

D1 (Jan. 2014) contains 38,781 apps requesting 7,826 different permissions, and 1,402,717 user reviews.

D2 (Mar. 2014) contains 46,644 apps and 9,319 different permission requests, and 1,361,319 user reviews.

Additional stats about the datasets are available here.

Dataset Description To store the dataset, we created a graph database with Neo4j. This dataset therefore consists of a graph describing the apps as nodes and edges. We chose a graph database because the graph visualization helps to identify connections among data (e.g., clusters of apps sharing similar sets of permission requests).

In particular, our dataset graph contains six types of nodes: - APP nodes containing metadata of each app, - PERMISSION nodes describing permission types, - CATEGORY nodes describing app categories, - SUBCATEGORY nodes describing app subcategories, - USER_REVIEW nodes storing user reviews. - TOPIC topics mined from user reviews (using LDA).

Furthermore, there are five types of relationships between APP nodes and each of the remaining nodes:

USES_PERMISSION relationships between APP and PERMISSION nodes

HAS_REVIEW between APP and USER_REVIEW nodes

HAS_TOPIC between USER_REVIEW and TOPIC nodes

BELONGS_TO_CATEGORY between APP and CATEGORY nodes

BELONGS_TO_SUBCATEGORY between APP and SUBCATEGORY nodes

Dataset Files Info

Neo4j 2.0 Databases

googlePlayDB1-Jan2014_neo4j_2_0.rar

googlePlayDB2-Mar2014_neo4j_2_0.rar We provide two Neo4j databases containing the 2 snapshots of the Google Play Store (January and March 2014). These are the original databases created for the paper. The databases were created with Neo4j 2.0. In particular with the tool version 'Neo4j 2.0.0-M06 Community Edition' (latest version available at the time of implementing the paper in 2014).

Neo4j 3.5 Databases

googlePlayDB1-Jan2014_neo4j_3_5_28.rar

googlePlayDB2-Mar2014_neo4j_3_5_28.rar Currently, the version Neo4j 2.0 is deprecated and it is not available for download in the official Neo4j Download Center. We have migrated the original databases (Neo4j 2.0) to Neo4j 3.5.28. The databases can be opened with the tool version: 'Neo4j Community Edition 3.5.28'. The tool can be downloaded from the official Neo4j Donwload page.

In order to open the databases with more recent versions of Neo4j, the databases must be first migrated to the corresponding version. Instructions about the migration process can be found in the Neo4j Migration Guide. First time the Neo4j database is connected, it could request credentials. The username and pasword are: neo4j/neo4j
RICO dataset
kaggle.com
zip
Updated Dec 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Onur Gunes (2021). RICO dataset [Dataset]. https://www.kaggle.com/datasets/onurgunes1993/rico-dataset
Explore at:
zip(6703669364 bytes)Available download formats
Dataset updated
Dec 1, 2021
Authors
Onur Gunes
Description
Context

Data-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.3k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 66k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query-by-example search over UIs.

Content

Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. Like its predecessor ERICA, Rico’s app mining infrastructure requires no access to — or modification of — an app’s source code. Apps are downloaded from the Google Play Store and served to crowd workers through a web interface. When crowd workers use an app, the system records a user interaction trace that captures the UIs visited and the interactions performed on them. Then, an automated agent replays the trace to warm up a new copy of the app and continues the exploration programmatically, leveraging a content-agnostic similarity heuristic to efficiently discover new UI states. By combining crowdsourcing and automation, Rico can achieve higher coverage over an app’s UI states than either crawling strategy alone. In total, 13 workers recruited on UpWork spent 2,450 hours using apps on the platform over five months, producing 10,811 user interaction traces. After collecting a user trace for an app, we ran the automated crawler on the app for one hour.

Acknowledgements

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN https://interactionmining.org/rico

Inspiration

The Rico dataset is large enough to support deep learning applications. We trained an autoencoder to learn an embedding for UI layouts, and used it to annotate each UI with a 64-dimensional vector representation encoding visual layout. This vector representation can be used to compute structurally — and often semantically — similar UIs, supporting example-based search over the dataset. To create training inputs for the autoencoder that embed layout information, we constructed a new image for each UI capturing the bounding box regions of all leaf elements in its view hierarchy, differentiating between text and non-text elements. Rico’s view hierarchies obviate the need for noisy image processing or OCR techniques to create these inputs.
Data from: AndroCT: Ten Years of App Call Traces in Android
zenodo.org
explore.openaire.eu
application/gzip, txt
Updated Mar 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wen Li; Xiaoqin Fu; Haipeng Cai; Haipeng Cai; Wen Li; Xiaoqin Fu (2022). AndroCT: Ten Years of App Call Traces in Android [Dataset]. http://doi.org/10.5281/zenodo.6336104
Explore at:
application/gzip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6336104
Dataset updated
Mar 8, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wen Li; Xiaoqin Fu; Haipeng Cai; Haipeng Cai; Wen Li; Xiaoqin Fu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A large-scale dataset on the dynamic profiles based on function calls of 35,974 benign and malicious Android apps from 10 historical years (2010 through 2019). Function calls are a commonly used means to model program behaviors, which may contribute to various code analysis approaches to assuring software correctness, reliability, and security. In particular, our dataset includes dynamic profiles of each app resulting from the same-length of time (10 mins) of being exercised by randomly generated inputs on both emulator and real device, enabling interesting and useful app analysis that reason about app behaviors in an evolutionary perspective while informing the differences of app behaviors on different run-time hardware platforms. Since we have 20 yearly datasets associated with 35,974 unique Android apps across the 10 years, profiling these apps took 12,000 hours. Considering the costs of filtering out apps that were originally sampled but that we were unable to profile (due to various reasons such as broken APKs, not being executable because of incompatibility issues, not instrumentable, etc.), we took over two years to produce all these traces. We hope to save future researchers' time in producing such a set of dynamic data to enable their empirical and technical work.

==================

Thanks for your interest in our dataset. Collecting this dataset took tremendous computational and human effort. Thus, please observe the following restrictions in using our dataset:

- Do not redistribute this dataset without our consent.
- Do not make commercial usage of this dataset.
- Get a faculty, or someone in a permanent position, to agree and commit to these conditions.
- When publishing your work that uses our dataset, please cite the following MSR 2021 data paper.

@inproceedings{AndroidCT,
title = {AndroCT: Ten Years of App Call Traces in Android},
author = {Wen Li, Xiaoqin Fu, and Haipeng Cai},
booktitle = {The 18th International Conference on Mining Software Repositories (MSR 2021), Data Showcase Track},
year = {2021},
}
iOS and Android app analysis data
figshare.com
txt
Updated Oct 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristiina Rahkema (2020). iOS and Android app analysis data [Dataset]. http://doi.org/10.6084/m9.figshare.13103012.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13103012.v1
Dataset updated
Oct 16, 2020
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Kristiina Rahkema
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CSV file with code smell occurrences per application. One file for iOS and one for Android. Analysis of open source applications.
Coronavirus-themed Mobile Apps (Malware) Dataset
zenodo.org
data.niaid.nih.gov
+1more
Updated Apr 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
covid19apps; covid19apps (2021). Coronavirus-themed Mobile Apps (Malware) Dataset [Dataset]. http://doi.org/10.5281/zenodo.3875976
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3875976
Dataset updated
Apr 21, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
covid19apps; covid19apps
Description
As COVID-19 continues to spread across the world, a growing number of malicious campaigns are exploiting the pandemic. It is reported that COVID-19 is being used in a variety of online malicious activities, including Email scam, ransomware and malicious domains. As the number of the afflicted cases continue to surge, malicious campaigns that use coronavirus as a lure are increasing. Malicious developers take advantage of this opportunity to lure mobile users to download and install malicious apps.

However, besides a few media reports, the coronavirus-themed mobile malware has not been well studied. Our community lacks of the comprehensive understanding of the landscape of the coronavirus-themed mobile malware, and no accessible dataset could be used by our researchers to boost COVID-19 related cybersecurity studies.

We make efforts to create a daily growing COVID-19 related mobile app dataset. By the time of mid-November, we have curated a dataset of 4,322 COVID-19 themed apps, and 611 of them are considered to be malicious. The number is growing daily and our dataset will update weekly. For more details, please visit https://covid19apps.github.io

This dataset includes the following files:

(1) covid19apps.xlsx

In this file, we list all the COVID-19 themed apps information, including apk file hashes, released date, package name, AV-Rank, etc.

(2)covid19apps.zip

We put the COVID-19 themed apps Apk samples in zip files . In order to reduce the size of a single file, we divide the sample into multiple zip files for storage. And the APK file name after the file SHA256.

If your papers or articles use our dataset, please use the following bibtex reference to cite our paper: https://arxiv.org/abs/2005.14619

(Accepted to Empirical Software Engineering)

@misc{wang2021virus, title={Beyond the Virus: A First Look at Coronavirus-themed Mobile Malware}, author={Liu Wang and Ren He and Haoyu Wang and Pengcheng Xia and Yuanchun Li and Lei Wu and Yajin Zhou and Xiapu Luo and Yulei Sui and Yao Guo and Guoai Xu}, year={2021}, eprint={2005.14619}, archivePrefix={arXiv}, primaryClass={cs.CR} }
Google Play Store Category wise Top 500 Apps
kaggle.com
zip
Updated Feb 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shakthi Dhar (2022). Google Play Store Category wise Top 500 Apps [Dataset]. https://www.kaggle.com/datasets/shakthidhar/google-play-store-category-wise-top-500-apps
Explore at:
zip(474438 bytes)Available download formats
Dataset updated
Feb 1, 2022
Authors
Shakthi Dhar
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
Context

Google Play stores top 500 app data based on their rankings on January 2022 for all the available categories. Link to scraping code: https://github.com/Shakthi-Dhar/AppPin Link to backup datafiles: github data files

Content

The dataset contains the top 500 android apps available on the google play store for the following categories: All Categories, Art & Design, Auto & Vehicles, Beauty, Books & Reference, Business, Comics, Communication, Education, Entertainment, Events, Finance, Food & Drink, Health & Fitness, House & Home, Libraries & Demo, Lifestyle, Maps & Navigation, Medical, Music & Audio, News & Magazines, Parenting, Personalization, Photography, Productivity, Shopping, Social, Sports, Tools, Travel & Local, and Video Players & Editors.

The app rankings are based on google play store app rankings for January 2022.

Abbreviations

In Review and Downloads, the alphabet T, L, Cr represents Thousands, Lakhs, Crores as per the google play store naming convention. They are similar to M, B which represent millions, billions. 1L (1 Lakh) = 100T (100 Thousand) 10L (10 Lakhs) = 1M (1 Million) 1Cr( 1 Crore) = 10M (10 Million)

Acknowledgements

This data is not provided directly by Google, so I used Appium an automation tool with python to scrape the data from the google play store app.

Inspiration

Inspired by Fortune500. Fortune500 provides data on top companies in the world, so why not have a data source for top apps in the world.
R
Aos All Apps Dataset
universe.roboflow.com
zip
Updated May 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rico (2023). Aos All Apps Dataset [Dataset]. https://universe.roboflow.com/rico-tqjvo/aos-all-apps/model/1
Explore at:
zipAvailable download formats
Dataset updated
May 20, 2023
Dataset authored and provided by
Rico
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Android Apps Bounding Boxes
Description
AOS All Apps

## Overview AOS All Apps is a dataset for object detection tasks - it contains Android Apps annotations for 250 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
Android Process Memory String Dumps Dataset
su.figshare.com
researchdata.se
zip
Updated May 11, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irvin Homem; Panagiotis Papapetrou (2017). Android Process Memory String Dumps Dataset [Dataset]. http://doi.org/10.17045/sthlmuni.4989773.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.17045/sthlmuni.4989773.v1
Dataset updated
May 11, 2017
Dataset provided by
Stockholm University
Authors
Irvin Homem; Panagiotis Papapetrou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A dataset containing 2375 samples of Android Process Memory String Dumps. The dataset is broadly composed of 2 classes: "Benign App" Memory Dumps and "Malicious App" Memory Dumps, respectively, split into 2 ZIP archives. The ZIP archives in total are approximately 17GB in size, however the unzipped contents are approximately 67GB.This dataset is derived from a subset of the APK files originally made freely available for research through the AndroZoo project [1]. The AndroZoo project collected millions of Android applications and scanned them with the VirusTotal online malware scanning service, thereby classifying most of the apps as either malicious or benign at the time of scanning. The process memory dumps in this dataset were generated through running the subset of APK files from the AndroZoo dataset in an Android Emulator, capturing the process memory of the individual process and subsequently extracting only the strings from the process memory dump. This was facilitated through building 2 applications: Coriander and AndroMemDumpBeta which facilitate the running of Apps on Android Emulators, and the capturing of process memory respectively. The source code for these software applications is available on Github. The individual samples are labelled with the SHA256 hash filename from the original AndroZoo labeling and the application package names extracted from within the specific APK manifest file. They also contain a time-stamp for when the memory dumping process took place for the specific file. The file extension used is ".dmp" to indicate that the files are memory dumps, however they only contain strings, and thus can be viewed in any simple text editor.A subset of the first 10000 APK files from the original AndroZoo dataset is also included within this dataset. The metadata of these APK files is present in the file "AndroZoo-First-10000" and the 2375 Android Apps that are the main subjects of our dataset are extracted from here..Our dataset is intended to be used in furthering our research related to Machine Learning-based Triage for Android Memory Forensics. It has been made openly available in order to foster opportunities for collaboration with other researchers, to enable validation of research results as well as to enhance the body of knowledge in related areas of research.References:[1]. K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon. AndroZoo: Collecting Millions of Android Apps for the Research Community. Mining Software Repositories (MSR) 2016
p
Data from: Mobile App Analytics
paradoxintelligence.com
json/csv
Updated May 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paradox Intelligence (2025). Mobile App Analytics [Dataset]. https://www.paradoxintelligence.com/datasets
Explore at:
json/csvAvailable download formats
Dataset updated
May 3, 2025
Dataset authored and provided by
Paradox Intelligence
License
https://www.paradoxintelligence.com/termshttps://www.paradoxintelligence.com/terms
Time period covered
2015 - Present
Area covered
Global
Description
App download rankings, usage metrics, and user engagement data (iOS/Android)
f
Data from: Classifying code comments in Java Mobile Applications
figshare.com
data.4tu.nl
zip
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Pascarella (2023). Classifying code comments in Java Mobile Applications [Dataset]. http://doi.org/10.4121/uuid:97f5fc68-0c48-4ea6-b357-184f5b6809c9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:97f5fc68-0c48-4ea6-b357-184f5b6809c9
Dataset updated
Jun 6, 2023
Dataset provided by
4TU.ResearchData
Authors
Luca Pascarella
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Detailed comparison of mobile and desktop code comments. Dataset of manually classified Android mobile apps code comments.
Analyzing Application Data
kaggle.com
zip
Updated Feb 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Motola A (2023). Analyzing Application Data [Dataset]. https://www.kaggle.com/motolaa/appanalysis
Explore at:
zip(2766 bytes)Available download formats
Dataset updated
Feb 9, 2023
Authors
Motola A
Description
Link to code

Description:

The **company* that I work for builds iOS & Android mobile applications that are available in the App Store (iOS) and on Google Play (Android). I am a 'data analyst' at this company and am responsible for guiding the software developers in making data-driven decisions in regards to which apps they should build.

**This project was completed as part of a DataQuest course and was not used for a real company.*

Plan:

The criteria that my company has laid out for a successful app can be determined as follows:

Create a minimal Android version of the application and add it to Google Play.

The app will be developed further IF it gets a good response from users.

If app continues to be profitable after 6 months, an iOS version will be built and added to the App store.

The applications my company builds are all free for users to download and install. Our revenue mainly comes from in-app ads, so the number of users for any given app directly influences our profit.

Goal:

The main goal for this project is to analyze data and give our developers more insight on which kind of apps are more likely to attract users.

Conclusion:

Throughout this project, I analyzed data for the mobile apps in the App Store and Google Play in order to understand which apps would be profitable for both markets. I concluded that turning a popular book into an app could become profitable for both Google Play and the App Store. The team might include an audible version of the book, trivia, in-app platform to discuss with other users, daily quotes and more within the app.

The two .csv files for analysis: App Store Google PlayStore
Data set of Android permissions
figshare.com
xlsx
Updated May 12, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arvind Mahindru (2018). Data set of Android permissions [Dataset]. http://doi.org/10.6084/m9.figshare.5986708.v8
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5986708.v8
Dataset updated
May 12, 2018
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Arvind Mahindru
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 18,850 normal android application packages and 10,000 malware android packages which are used to identify the behaviour of malware application on permission they need at run-time.
Downloads details.
plos.figshare.com
xls
Updated Mar 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maciej Hyzy; Raymond Bond; Maurice Mulvenna; Lu Bai; Anna-Lena Frey; Jorge Martinez Carracedo; Robert Daly; Simon Leigh (2024). Downloads details. [Dataset]. http://doi.org/10.1371/journal.pone.0298977.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298977.t003
Dataset updated
Mar 4, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Maciej Hyzy; Raymond Bond; Maurice Mulvenna; Lu Bai; Anna-Lena Frey; Jorge Martinez Carracedo; Robert Daly; Simon Leigh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveTo analyse the relationship between health app quality with user ratings and the number of downloads of corresponding health apps.Materials and methodsUtilising a dataset of 881 Android-based health apps, assessed via the 300-point objective Organisation for the Review of Care and Health Applications (ORCHA) assessment tool, we explored whether subjective user-level indicators of quality (user ratings and downloads) correlate with objective quality scores in the domains of user experience, data privacy and professional/clinical assurance. For this purpose, we applied spearman correlation and multiple linear regression models.ResultsFor user experience, professional/clinical assurance and data privacy scores, all models had very low adjusted R squared values (< .02). Suggesting that there is no meaningful link between subjective user ratings or the number of health app downloads and objective quality measures. Spearman correlations suggested that prior downloads only had a very weak positive correlation with user experience scores (Spearman = .084, p = .012) and data privacy scores (Spearman = .088, p = .009). There was a very weak negative correlation between downloads and professional/clinical assurance score (Spearman = -.081, p = .016). Additionally, user ratings demonstrated a very weak correlation with no statistically significant correlations observed between user ratings and the scores (all p > 0.05). For ORCHA scores multiple linear regression had adjusted R-squared = -.002.ConclusionThis study highlights that widely available proxies which users may perceive to signify the quality of health apps, namely user ratings and downloads, are inaccurate predictors for estimating quality. This indicates the need for wider use of quality assurance methodologies which can accurately determine the quality, safety, and compliance of health apps. Findings suggest more should be done to enable users to recognise high-quality health apps, including digital health literacy training and the provision of nationally endorsed “libraries”.
m
Data from: A dataset from the daily use of features in Android devices
data.mendeley.com
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edwin Monteiro (2024). A dataset from the daily use of features in Android devices [Dataset]. http://doi.org/10.17632/bpsrw76hgx.1
Explore at:
Unique identifier
https://doi.org/10.17632/bpsrw76hgx.1
Dataset updated
Feb 16, 2024
Authors
Edwin Monteiro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The energy consumption of Android devices, measured via data collection from features, is a recurring theme in the literature. To evaluate the performance of such devices, databases are generated through the collection data from features while using the Android operating system. This article describes a database generated from the daily use of smartphones and tablets while performing everyday tasks. The dataset contains 98 features and 10,517,165 of records related to dynamic, background, app list and static data. Device records were collected every day from ten distinct devices and stored in CSV files that were later organized to generate a database by cleaning and preprocessing the data that are publically available in the Mendeley Data Repository.
Google Play Store App Details
kaggle.com
zip
Updated Jul 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Ghosh (2022). Google Play Store App Details [Dataset]. https://www.kaggle.com/datasets/souravghosh01/google-play-store-app-details
Explore at:
zip(123503 bytes)Available download formats
Dataset updated
Jul 19, 2022
Authors
Sourav Ghosh
Description
Google Play Store Google Play, also branded as the Google Play Store and formerly Android Market, is a digital distribution service operated and developed by Google. It serves as the official app store for certified devices running on the Android operating system and its derivatives as well as Chrome OS, allowing users to browse and download applications developed with the Android software development kit (SDK) and published through Google. Google Play also serves as a digital media store, offering music, books, movies, and television programs. Content that has been purchased on Google Play Movies & TV and Google Play Books can be accessed on a web browser and through the Android and iOS apps.

Content This dataset contains details of 50 Apps which are categorized into Browsers, Video Players, File Managers, Mobile Payment, and Communication (10 apps from each category). This dataset can be used for prediction.

Some important variables: title: Title of the app install: Number of installations score: Average rating of the app ratings: Total Number of users rated containsAds: Whether the app is montized or not. appId: unique application ID that looks like a Java package name category: App belonging to the category.
R
Android Ui Objects Dataset
universe.roboflow.com
zip
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vito Völker (2023). Android Ui Objects Dataset [Dataset]. https://universe.roboflow.com/vito-volker/android-ui-objects/dataset/7
Explore at:
zipAvailable download formats
Dataset updated
Dec 4, 2023
Dataset authored and provided by
Vito Völker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
App UI Elements Bounding Boxes
Description
Android UI Objects

## Overview Android UI Objects is a dataset for object detection tasks - it contains App UI Elements annotations for 1,412 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Facebook

Twitter

Click to copy link

Link copied

Cite

Yusuf Delikkaya (2024). Google Play Store Apps Dataset [Dataset]. https://www.kaggle.com/datasets/yusufdelikkaya/google-play-store-apps-dataset/code

Google Play Store Apps Dataset

Google Play Store 10k+ Apps Dataset

Explore at:

zip(319016 bytes)Available download formats

Dataset updated

Oct 30, 2024

Authors

Yusuf Delikkaya

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Description:

The dataset comprises anonymized data on apps available on the Google Play Store, capturing various aspects such as ratings, downloads, and categorization.
The dataset has 10,841 entries, with some columns containing missing values, particularly in "Rating," "Type," "Content Rating," "Current Ver," and "Android Ver".
This dataset can be utilized for analyzing trends in mobile app usage, user preferences, and app performance metrics across different categories.
It can aid in understanding the impact of factors like app size, rating, and category on user downloads and popularity.
This dataset can be utilized for analyzing app popularity, user preferences, and the relationship between app features (e.g., size, price) and downloads.
It can help in identifying trends in app categories, assessing user satisfaction through ratings and reviews, and providing insights for app developers and marketers on user engagement and app performance.

Features:

Column Name	Description
App	The name of the app as listed on the Google Play Store.
Category	The category to which the app belongs (e.g., ART_AND_DESIGN, GAME).
Rating	The user rating of the app on a scale from 1 to 5.
Reviews	The number of user reviews for the app.
Size	The size of the app in megabytes (MB) or kilobytes (KB).
Installs	The number of installs/downloads of the app (e.g., 10,000+).
Type	Indicates whether the app is free or paid.
Price	The price of the app in USD, if it is a paid app.
Content Rating	The target audience for the app (e.g., Everyone, Teen, Mature 17+).
Genres	The genres associated with the app (e.g., Art & Design, Creativity).
Last Updated	The date when the app was last updated.
Current Ver	The current version of the app.
Android Ver	The minimum Android version required to run the app.

Clear search

Close search

Google apps

Main menu

Google Play Store Apps Dataset

Description:

Features:

Play Store Apps

googleplaystore.csv

googleplaystore_user_reviews.csv

Screen Time and App Usage Dataset (iOS/Android)

Data from: Hall-of-Apps: The Top Android Apps Metadata Archive

Dataset used for "A Recommender System of Buggy App Checkers for App Store...

RICO dataset

Context

Content

Acknowledgements

Inspiration

Data from: AndroCT: Ten Years of App Call Traces in Android

iOS and Android app analysis data

Coronavirus-themed Mobile Apps (Malware) Dataset

Google Play Store Category wise Top 500 Apps

Context

Content

Abbreviations

Acknowledgements

Inspiration

Aos All Apps Dataset

AOS All Apps

Android Process Memory String Dumps Dataset

Data from: Mobile App Analytics

Data from: Classifying code comments in Java Mobile Applications

Analyzing Application Data

Description:

Plan:

Goal:

Conclusion:

Data set of Android permissions

Downloads details.

Data from: A dataset from the daily use of features in Android devices

Google Play Store App Details

Android Ui Objects Dataset

Android UI Objects

Google Play Store Apps Dataset

Google Play Store 10k+ Apps Dataset

Description:

Features: