Data-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.3k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 66k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query-by-example search over UIs.
Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. Like its predecessor ERICA, Rico’s app mining infrastructure requires no access to — or modification of — an app’s source code. Apps are downloaded from the Google Play Store and served to crowd workers through a web interface. When crowd workers use an app, the system records a user interaction trace that captures the UIs visited and the interactions performed on them. Then, an automated agent replays the trace to warm up a new copy of the app and continues the exploration programmatically, leveraging a content-agnostic similarity heuristic to efficiently discover new UI states. By combining crowdsourcing and automation, Rico can achieve higher coverage over an app’s UI states than either crawling strategy alone. In total, 13 workers recruited on UpWork spent 2,450 hours using apps on the platform over five months, producing 10,811 user interaction traces. After collecting a user trace for an app, we ran the automated crawler on the app for one hour.
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN https://interactionmining.org/rico
The Rico dataset is large enough to support deep learning applications. We trained an autoencoder to learn an embedding for UI layouts, and used it to annotate each UI with a 64-dimensional vector representation encoding visual layout. This vector representation can be used to compute structurally — and often semantically — similar UIs, supporting example-based search over the dataset. To create training inputs for the autoencoder that embed layout information, we constructed a new image for each UI capturing the bounding box regions of all leaf elements in its view hierarchy, differentiating between text and non-text elements. Rico’s view hierarchies obviate the need for noisy image processing or OCR techniques to create these inputs.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.
Dataset Specifications:
Last crawled:
(This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)Richness of Detail (11 Comprehensive Fields):
Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:
Review Content:
review
: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.title
: The title given to the review by the user, often summarizing their main point.isEdited
: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.Reviewer & Rating Information:
username
: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).rating
: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.App & Origin Context:
app_name
: The name of the application being reviewed.app_id
: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.country
: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.Metadata & Timestamps:
_id
: A unique identifier for the specific review record in the dataset.crawled_at
: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).date
: The original date the review was posted by the user on the App Store.Expanded Use Cases & Analytical Applications:
This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:
Product Development & Improvement:
review
text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.review
text to inform future product roadmap decisions and develop features users actively desire.review
field.rating
and sentiment
after new app updates to assess the effectiveness of bug fixes or new features.Market Research & Competitive Intelligence:
Marketing & App Store Optimization (ASO):
review
and title
fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.rating
trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.Academic & Data Science Research:
review
and title
fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.rating
distribution, isEdited
status, and date
to understand user engagement and feedback cycles.country
-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.
We built a crawler to collect data from the Google Play store including the application's metadata and APK files. The manifest files were extracted from the APK files and then processed to extract the features. The data set is composed of 870,515 records/apps, and for each app we produced 48 features. The data set was used to built and test two bootstrap aggregating of multiple XGBoost machine learning classifiers. The dataset were collected between April 2017 and November 2018. We then checked the status of these applications on three different occasions; December 2018, February 2019, and May-June 2019.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
If you use this dataset anywhere in your work, kindly cite as the below: L. Gupta, "Google Play Store Apps," Feb 2019. [Online]. Available: https://www.kaggle.com/lava18/google-play-store-apps
While many public datasets (on Kaggle and the like) provide Apple App Store data, there are not many counterpart datasets available for Google Play Store apps anywhere on the web. On digging deeper, I found out that iTunes App Store page deploys a nicely indexed appendix-like structure to allow for simple and easy web scraping. On the other hand, Google Play Store uses sophisticated modern-day techniques (like dynamic page load) using JQuery making scraping more challenging.
Each app (row) has values for catergory, rating, size, and more.
This information is scraped from the Google Play Store. This app information would not be available without it.
The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers to work on and capture the Android market!
Aaditya1/Mobile-Application-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
As COVID-19 continues to spread across the world, a growing number of malicious campaigns are exploiting the pandemic. It is reported that COVID-19 is being used in a variety of online malicious activities, including Email scam, ransomware and malicious domains. As the number of the afflicted cases continue to surge, malicious campaigns that use coronavirus as a lure are increasing. Malicious developers take advantage of this opportunity to lure mobile users to download and install malicious apps.
However, besides a few media reports, the coronavirus-themed mobile malware has not been well studied. Our community lacks of the comprehensive understanding of the landscape of the coronavirus-themed mobile malware, and no accessible dataset could be used by our researchers to boost COVID-19 related cybersecurity studies.
We make efforts to create a daily growing COVID-19 related mobile app dataset. By the time of mid-November, we have curated a dataset of 4,322 COVID-19 themed apps, and 611 of them are considered to be malicious. The number is growing daily and our dataset will update weekly. For more details, please visit https://covid19apps.github.io
This dataset includes the following files:
(1) covid19apps.xlsx
In this file, we list all the COVID-19 themed apps information, including apk file hashes, released date, package name, AV-Rank, etc.
(2)covid19apps.zip
We put the COVID-19 themed apps Apk samples in zip files . In order to reduce the size of a single file, we divide the sample into multiple zip files for storage. And the APK file name after the file SHA256.
If your papers or articles use our dataset, please use the following bibtex reference to cite our paper: https://arxiv.org/abs/2005.14619
(Accepted to Empirical Software Engineering)
@misc{wang2021virus, title={Beyond the Virus: A First Look at Coronavirus-themed Mobile Malware}, author={Liu Wang and Ren He and Haoyu Wang and Pengcheng Xia and Yuanchun Li and Lei Wu and Yajin Zhou and Xiapu Luo and Yulei Sui and Yao Guo and Guoai Xu}, year={2021}, eprint={2005.14619}, archivePrefix={arXiv}, primaryClass={cs.CR} }
As of May 2023, Facebook collected the larger number of total unique data points from global iOS users, around 32 data points. Popular digital payment app PayPal and Airbnb collected 26 data points each, while AI tool photo and image editing apps Photoleap collected around 14 unique data points.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unlock the power of user feedback with our iOS App Store Reviews Dataset, a comprehensive collection of reviews from thousands of apps across various categories. This robust App Store dataset includes essential details such as app names, ratings, user comments, timestamps, and more, offering valuable insights into user experiences and preferences.
Perfect for app developers, marketers, and data analysts, this dataset allows you to conduct sentiment analysis, monitor app performance, and identify trends in user behavior. By leveraging the iOS App Store Reviews Dataset, you can refine app features, optimize marketing strategies, and elevate user satisfaction.
Whether you’re tracking mobile app trends, analyzing specific app categories, or developing data-driven strategies, this App Store dataset is an indispensable tool. Download the iOS App Store Reviews Dataset today or contact us for custom datasets tailored to your unique project requirements.
Ready to take your app insights to the next level? Get the iOS App Store Reviews Dataset now or explore our custom data solutions to meet your needs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Build mobile apps with Ionic 2 and Firebase : hybrid mobile app development. It features 7 columns including author, publication date, language, and book publisher.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We surveyed 10,208 people from more than 15 countries on their mobile app usage behavior. The countries include USA, China, Japan, Germany, France, Brazil, UK, Italy, Russia, India, Canada, Spain, Australia, Mexico, and South Korea. We asked respondents about: (1) their mobile app user behavior in terms of mobile app usage, including the app stores they use, what triggers them to look for apps, why they download apps, why they abandon apps, and the types of apps they download. (2) their demographics including gender, age, marital status, nationality, country of residence, first language, ethnicity, education level, occupation, and household income (3) their personality using the Big-Five personality traits This dataset contains the results of the survey.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
App Download Key StatisticsApp and Game DownloadsiOS App and Game DownloadsGoogle Play App and Game DownloadsGame DownloadsiOS Game DownloadsGoogle Play Game DownloadsApp DownloadsiOS App...
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This comprehensive synthetic dataset contains 2,514 authentic mobile app reviews spanning 40+ popular applications across 24 different languages, making it ideal for multilingual NLP, sentiment analysis, and cross-cultural user behavior research.
Column Name | Data Type | Description | Sample Values | Null Count |
---|---|---|---|---|
review_id | Integer | Unique identifier for each review | 1, 2, 3, ... | 0 |
user_id | String* | User identifier (should be integer) | "1967825", "9242600" | 0 |
app_name | String | Name of the mobile application | WhatsApp, Instagram, TikTok | 0 |
app_category | String | Application category | Social Networking, Entertainment | 0 |
review_text | String | Multilingual review content | "This app is amazing!" | 63 |
review_language | String | ISO language code | en, es, fr, zh, hi, ar | 0 |
rating | Mixed* | App rating (1.0-5.0, some as strings) | 4.5, "3.2", 1.1 | 38 |
review_date | DateTime | Timestamp of review submission | 2024-10-09 19:26:40 | 0 |
verified_purchase | Boolean | Purchase verification status | True, False | 0 |
device_type | String | Device platform | Android, iOS, iPad, Windows Phone | 0 |
num_helpful_votes | Mixed* | Helpfulness votes (some as strings) | 65, "209", 163 | 0 |
user_age | Float* | User age (should be integer) | 14.0, 18.0, 67.0 | 0 |
user_country | String | User's country | China, Germany, Nigeria | 50 |
user_gender | String | User gender | Male, Female, Non-binary, Prefer not to say | 88 |
app_version | String | Application version number | 1.4, v8.9, 2.8.37.5926 | 25 |
Note: Data types marked with asterisk require cleaning/conversion
The dataset includes reviews in 24 languages: - European: English (en), Spanish (es), French (fr), German (de), Italian (it), Russian (ru), Polish (pl), Dutch (nl), Swedish (sv), Danish (da), Norwegian (no), Finnish (fi) - Asian: Chinese (zh), Hindi (hi), Japanese (ja), Korean (ko), Thai (th), Vietnamese (vi), Indonesian (id), Malay (ms) - Other: Arabic (ar), Turkish (tr), Filipino (tl)
Reviews cover 18 distinct categories:
- Social Networking
- Entertainment
- Productivity
- Travel & Local
- Music & Audio
- Video Players & Editors
- Shopping
- Navigation
- Finance
- Communication
- Education
- Photography
- Dating
- Business
- Utilities
- Health & Fitness
- Games
- News & Magazines
40+ applications including: - Social: WhatsApp, Instagram, Facebook, Snapchat, TikTok, LinkedIn, Twitter, Reddit, Pinterest - Entertainment: YouTube, Netflix, Spotify - Productivity: Microsoft Office, Google Drive, Dropbox, OneDrive, Zoom, Discord - Travel: Uber, Lyft, Airbnb, Booking.com, Google Maps, Waze - Finance: PayPal, Venmo - Education: Duolingo, Khan Academy, Coursera, Udemy - Tools: Grammarly, Canva, Adobe Photoshop, VLC, MX Player
Reviews from 24 countries across all continents: - Asia: China, India, Japan, South Korea, Thailand, Vietnam, Indonesia, Malaysia, Philippines, Pakistan, Bangladesh - Europe: Germany, United Kingdom, France, Italy, Spain, Russia, Turkey, Poland - Americas: United States, Canada, Brazil, Mexico - Oceania: Australia - Africa: Nigeria
Intentional data challenges for learning:
- Missing Values: Strategic nulls in review_text (63), rating (38), user_country (50), user_gender (88), app_version (25)
- Data Type Issues:
- user_id stored as strings (should be integers)
- user_age as floats (should be integers)
- Some ratings as strings (should be floats)
- Some helpful_votes as strings (should be integers)
- Mixed Version Formats: "1.4", "v8.9", "2.8.37.5926", "14.1.60.318-beta"
This dataset is perfect for: - Multilingual NLP projects and sentiment analysis - Cross-cultural user behavior analysis - App store analytics and rating prediction - Data cleaning and preprocessing practice - Text classification across multiple languages - Time series analysis of app reviews - Geographic sentiment analysis - Data engineering pipeline development
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nowadays, mobile applications (a.k.a., apps) are used by over two billion users for every type of need, including social and emergency connectivity. Their pervasiveness in today world has inspired the software testing research community in devising approaches to allow developers to better test their apps and improve the quality of the tests being developed. In spite of this research effort, we still notice a lack of empirical analyses aiming at assessing the actual quality of test cases manually developed by mobile developers: this perspective could provide evidence-based findings on the future research directions in the field as well as on the current status of testing in the wild. As such, we performed a large-scale empirical study targeting 1,780 open-source Android apps and aiming at assessing (1) the extent to which these apps are actually tested, (2) how well-designed are the available tests, and (3) what is their effectiveness. The key results of our study show that mobile developers still tend not to properly test their apps, possibly because of time to market requirements. Furthermore, we discovered that the test cases of the considered apps have a low (i) design quality, both in terms of test code metrics and test smells, and (ii) effectiveness when considering code coverage as well as assertion density.
As of May 2023, product interaction data were the most commonly collected data points, with 94 over the 100 analyzed apps reporting to collect such data. User ID and crash data were collected by by 93 and 92 apps over 100, respectively. Over the 10 leading shopping apps hosted on the Apple App Store, the totality collected precise location, physical address, and payment info.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Object Detection Mobile App is a dataset for object detection tasks - it contains Objects annotations for 2,255 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This comprehensive iOS application reviews dataset contains thousands of authentic user reviews from the Apple App Store in English. The dataset provides valuable insights for app developers, marketers, and researchers studying mobile application performance and user sentiment.
Key Features:
Applications: Perfect for sentiment analysis, app store optimization, mobile app development research, user experience studies, and competitive analysis. This dataset enables businesses to understand user preferences, identify app improvement opportunities, and develop better mobile applications.
Data Quality: All reviews are genuine user feedback collected from the official Apple App Store, ensuring authenticity and reliability for research and business intelligence purposes. The dataset covers various app categories including fitness, shopping, education, entertainment, and productivity applications.
As of August 2022, language learning app HelloTalk and Google's meeting point for schools Google Classroom were the educational app collecting the largest amount of data points. ClassDojo and popular language learning app Duolingo followed, collecting approximately ** different data points from global Android users.
https://brightdata.com/licensehttps://brightdata.com/license
This dataset encompasses a wide-ranging collection of Google Play applications, providing a holistic view of the diverse ecosystem within the platform. It includes information on various attributes such as the title, developer, monetization features, images, app descriptions, data safety measures, user ratings, number of reviews, star rating distributions, user feedback, recent updates, related applications by the same developer, content ratings, estimated downloads, and timestamps. By aggregating this data, the dataset offers researchers, developers, and analysts an extensive resource to explore and analyze trends, patterns, and dynamics within the Google Play Store. Researchers can utilize this dataset to conduct comprehensive studies on user behavior, market trends, and the impact of various factors on app success. Developers can leverage the insights derived from this dataset to inform their app development strategies, improve user engagement, and optimize monetization techniques. Analysts can employ the dataset to identify emerging trends, assess the performance of different categories of applications, and gain valuable insights into consumer preferences. Overall, this dataset serves as a valuable tool for understanding the broader landscape of the Google Play Store and unlocking actionable insights for various stakeholders in the mobile app industry.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Key US App Market StatisticsUS App Market SizeUS App Market Revenue by AppUS Smartphone UsersUS Smartphone PopulationTime Spent on Apps in the USUS App Market DownloadsUS Downloads by AppUS Daily...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
the device was left without user interaction for 5 minutes.
Data-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.3k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 66k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query-by-example search over UIs.
Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. Like its predecessor ERICA, Rico’s app mining infrastructure requires no access to — or modification of — an app’s source code. Apps are downloaded from the Google Play Store and served to crowd workers through a web interface. When crowd workers use an app, the system records a user interaction trace that captures the UIs visited and the interactions performed on them. Then, an automated agent replays the trace to warm up a new copy of the app and continues the exploration programmatically, leveraging a content-agnostic similarity heuristic to efficiently discover new UI states. By combining crowdsourcing and automation, Rico can achieve higher coverage over an app’s UI states than either crawling strategy alone. In total, 13 workers recruited on UpWork spent 2,450 hours using apps on the platform over five months, producing 10,811 user interaction traces. After collecting a user trace for an app, we ran the automated crawler on the app for one hour.
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN https://interactionmining.org/rico
The Rico dataset is large enough to support deep learning applications. We trained an autoencoder to learn an embedding for UI layouts, and used it to annotate each UI with a 64-dimensional vector representation encoding visual layout. This vector representation can be used to compute structurally — and often semantically — similar UIs, supporting example-based search over the dataset. To create training inputs for the autoencoder that embed layout information, we constructed a new image for each UI capturing the bounding box regions of all leaf elements in its view hierarchy, differentiating between text and non-text elements. Rico’s view hierarchies obviate the need for noisy image processing or OCR techniques to create these inputs.