We built a crawler to collect data from the Google Play store including the application's metadata and APK files. The manifest files were extracted from the APK files and then processed to extract the features. The data set is composed of 870,515 records/apps, and for each app we produced 48 features. The data set was used to built and test two bootstrap aggregating of multiple XGBoost machine learning classifiers. The dataset were collected between April 2017 and November 2018. We then checked the status of these applications on three different occasions; December 2018, February 2019, and May-June 2019.
Over 20 Million iOS and Android Apps Data Where You Want It, When You Want It With billions of data points on mobile apps, SDKs, and developers, you'll close more deals and reduce customer churn.
Our app data is refreshed constantly to ensure you and your team have the best mobile intelligence on your side at all times.
Bulk exports, API endpoints, and CRM integrations means you can move faster with data that works everywhere.
Interested in a deeper integration? Get in touch to learn about our professional services.
Start for as low as $750/month
As of March 2024, around 88 percent of the data linked to users collected by iOS apps was used by app publishers to integrate their product's functionalities. In comparison, 71 percent of app data not directly linked to users had the same function. Collecting analytics data was the second most common reason for apps to collect iOS users' data, while only 17 percent of identifiable user data and 10 percent of non-identifiable users' data went to improve or integrate third-party advertising services.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created using Wireshark. The dataset contains a total of 30 encrypted communication records, 3 records (.pcap) were created for each application. The records were obtained from a mobile device that was connected to the laptop using wifi technology. The laptop was connected to the Internet and contained a running instance of Wireshark to create a record. The telephone had been restarted before each record was created. After connecting to the network, the device was left without user interaction for 5 minutes. At the end of the 5-minute period, Wireshark recording was started. In the meantime, the interaction with the given application was taking place on the mobile device. The dataset was created to identify mobile applications in encrypted traffic based on TLS fingerprints.
This dataset has been collected from two different sources. The first dataset was taken from [1] and collected by Panichella et al. We obtained this dataset from Dr. Sebastiano Panichella via email. This dataset contains reviews of the AngryBirds, Dropbox, and Evernote app, which were taken from Apple’s App Store, other reviews were taken from Android’s Google Play store such as TripAdvisor, PicsArt, Pinterest and Whatsapp. This dataset consist of with 1390 reviews from all previously mentioned apps and all reviews were classified into four classes related to Software engineering’s maintenance task as follows: 192 reviews as Feature Request (FR), 494 reviews as Problem Discovery (PD), 603 reviews as Information Gaing (IG) and 101 reviews as Information Seeking (IS). We indicate to this dataset as “Pan Dataset”. The second dataset is used in [2] and prepared by Maalej et al. It is available at Hamburg University website on this direct link (https://mast.informatik.uni-hamburg.de/app-review-analysis). The truth dataset contains 3691 reviews from different Google’s apps store and Apple’s app store. We indicate to this dataset as “maalej dataset”. All reviews were classified into four classes related to Software engineering’s maintenance task as follows: 252 reviews as Feature Request (FR), 370 reviews as bug report/Problem Discovery (BR/PD), 607 reviews as User Experience (UE) and 2461 reviews as Rating (RT)
Analysys Qianfan is the data service affiliated with the widely-known third-party digital economy industry research house Analysys. Through a proprietary SDK embedded in more than 30k partner mobile apps, as well as external data provided by the three major wireless carriers in China, Analysys Qianfan is able to monitor online activity for 600m+ monthly active devices, or 86.9m DAU, as of November 2020. The firm identifies ~300 behavioural metrics for mobile internet users and provides this for 30k+ mobile apps across dozens of sub-sectors, including healthcare, food & gourmet, ride sharing, SaaS and financial services. Examples of available metrics include: app name; number of average active users per hour, day, week and month; average time spent on app; penetration rate; retention rate; user age; gender; phone model. Marketers, venture capitalists, and equity investors are using Analysys QF’s data to gauge user penetration and engagement for products and campaigns, identify investable start-ups, and monitor the popularity of mobile app products and services for listed companies. Data is presented in Chinese, Japanese and English and accessible via email, website or Weixin mini program. Annual pricing fees are dependent on the required universe coverage but typically range between 100k-1m RMB per annum.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of apps needed permissions during installation and run-time. We collect apps from three different sources google play, third-party apps and malware dataset. This file contains more than 5,00,000 Android apps. features extracted at the time of installation and execution. One file contains the name of the features and others contain .apk file corresponding to it extracted permissions and API calls. Benign apps are collected from Google's play store, hiapk, app china, Android, mumayi , gfan slideme, and pandaapp. These .apk files collected from the last three years continuously and contain 81 distinct malware families.
Dataset Card for Dataset Name
Dataset Summary
MobileRec is a large-scale app recommendation dataset. There are 19.3 million user\item interactions. This is a 5-core dataset. User\item interactions are sorted in ascending chronological order. There are 0.7 million users who have had at least five distinct interactions. There are 10173 apps in total.
Supported Tasks and Leaderboards
Sequential Recommendation
Languages
English… See the full description on the dataset page: https://huggingface.co/datasets/recmeapp/mobilerec.
As of March 2024, StartApp dominated among Android apps that used mobile data intelligence software development kit (SDK) with nearly 80 percent integration reach. Second-ranked Umlaut P3 was integrated with over six percent of apps that used data intelligence SDKs.
This dataset contains 54,987 UI screenshots and the metadata from 7,748 Android applications belonging to 25 application categories
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AndroR2 is a dataset of 90 manually reproduced bug reports for Android apps listed on Google Play and hosted on GitHub, systematically collected via an in-depth analysis of 459 reports extracted from the GitHub issue tracker. For each reproduced report, AndroR2 includes the original bug report, an apk file for the buggy version of the app, an executable reproduction script, and metadata regarding the quality of the reproduction steps associated with the original report. We believe that the AndroR2 dataset can be used to facilitate research in automatically analyzing, understanding, reproducing, localizing, and fixing bugs for mobile applications as well as other software maintenance activities more broadly.
While many public datasets (on Kaggle and the like) provide Apple App Store data, there are not many counterpart datasets available for Google Play Store apps anywhere on the web. On digging deeper, I found out that iTunes App Store page deploys a nicely indexed appendix-like structure to allow for simple and easy web scraping. On the other hand, Google Play Store uses sophisticated modern-day techniques (like dynamic page load) using JQuery making scraping more challenging.
Each app (row) has values for catergory, rating, size, and more.
This information is scraped from the Google Play Store. This app information would not be available without it.
The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers to work on and capture the Android market!
IOS App reviews dataset
Fields:
rating
review_date
app_name
tags
country
title
app_id
content
version
link
_id
Complete dataset:
total records: 1.9M +
number fields: 12
countries covered: 102
Download complete dataset from crawl feeds
https://www.gesis.org/fileadmin/upload/Datenservices/Nutzungsbedingungen/2023-06-30_Usage_regulations.pdfhttps://www.gesis.org/fileadmin/upload/Datenservices/Nutzungsbedingungen/2023-06-30_Usage_regulations.pdf
The goal of this study is to measure willingness to participate in passive mobile data collection among German smartphone owners. The data come from a two-wave web survey among German smartphone users 18 years and older who were recruited from a German nonprobability online panel. In December 2016, 2,623 participants completed the Wave 1 questionnaire on smartphone use and skills, privacy and security concerns, and general attitudes towards survey research and research institutions. In January 2017, all respondents from Wave 1 were invited to participate in a second web survey which included vignettes that varied the levels of several dimensions of a hypothetical study using passive mobile data collection, and respondents were asked to rate their willingness to participate in such a study. A total of 1,957 respondents completed the Wave 2 questionnaire.
Wave 1
Topics: Ownership of smartphone, mobile phone, PC, tablet, and/or e-book reader; type of smartphone; frequency of smartphone use; smartphone activities (browsing, e-mails, taking photos, view/ post social media content, shopping, online banking, installing apps, using GPS-enabled apps, connecting via Bluethooth, play games, stream music/ videos); self-assessment of smartphone skills; attitude towards surveys and participaton at research studies (personal interest, waste of time, sales pitch, interesting experience, useful); trust in institutions regarding data privacy (market research companies, university researchers, statistical office, mobile service provider, app companies, credit card companies, online retailer, and social networks); concerns regarding the disclosure of personal data by the aforementioned institutions; general privacy concern; privacy violated by banks/ credit card companies, tax authorities, government agencies, market research companies, social networks, apps, internet browsers); concern regarding data security with smartphone activities for research (online survey, survey apps, research apps, SMS survey, camera, activity data, GPS location, Bluetooth); number of online surveys in which the respondent has participated in the last 30 days; Panel memberships other than that of mingle; previous participation in a study with downloading a research app to the smartphone (passive mobile data collection).
Wave 2
Topics: Willingness to participate in passive mobile data collection (using eight vignettes with different scenarios that varied the levels of several dimensions of a hypothetical study using passive mobile data collection. The research app collects the following data for research purposes: technical characteristics of the smartphone (e.g. phone brand, screen size), the currently used telephone network (e.g. signal strength), the current location (every 5 minutes), which apps are used and which websites are visited, number of incoming and outgoing calls and SMS messages on the smartphone); reason why the respondent wouldn´t (respectively would) participate in the research study used in the first scenario (open answer); recognition of differences between the eight scenarios; kind of recognized difference (open answer); remembered data the research app collects (recall); previous invitation for research app download; research app download.
Demography: sex; age; federal state; highest level of school education; highest level of vocational qualification.
Additionally coded was: running number; respondent ID; duration (response time in seconds); device type used to fill out the questionnaire; vignette text; vignette intro time; vignette time.
The name and download numbers of government mobile apps.
As of May 2023, Facebook collected the larger number of total unique data points from global iOS users, around 32 data points. Popular digital payment app PayPal and Airbnb collected 26 data points each, while AI tool photo and image editing apps Photoleap collected around 14 unique data points.
The Collective is the largest independent global data marketplace (DMP) with over ten billion all-time MAIDs and over 68M MAU and 8M DAU in our App Activity data feed.
We also take great pride in the curation of our supplier network which is comprised of over 50 top tier, privacy first, and fully consented aggregators and publishers to ensure compliance, quality and scale.
This listing focuses on app usage data, SDK Data, and App Session Data. Do not hesitate to contact Collective Data Solutions and access to the best Mobile App Data on the market.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Apple App Store Key StatisticsApps & Games in the Apple App StoreApps in the Apple App StoreGames in the Apple App StoreMost Popular Apple App Store CategoriesPaid vs Free Apps in Apple App...
As COVID-19 continues to spread across the world, a growing number of malicious campaigns are exploiting the pandemic. It is reported that COVID-19 is being used in a variety of online malicious activities, including Email scam, ransomware and malicious domains. As the number of the afflicted cases continue to surge, malicious campaigns that use coronavirus as a lure are increasing. Malicious developers take advantage of this opportunity to lure mobile users to download and install malicious apps. However, besides a few media reports, the coronavirus-themed mobile malware has not been well studied. Our community lacks of the comprehensive understanding of the landscape of the coronavirus-themed mobile malware, and no accessible dataset could be used by our researchers to boost COVID-19 related cybersecurity studies. We make efforts to create a daily growing COVID-19 related mobile app dataset. By the time of mid-November, we have curated a dataset of 4,322 COVID-19 themed apps, and 611 of them are considered to be malicious. The number is growing daily and our dataset will update weekly. For more details, please visit https://covid19apps.github.io This dataset includes the following files: (1) covid19apps.xlsx In this file, we list all the COVID-19 themed apps information, including apk file hashes, released date, package name, AV-Rank, etc. (2)covid19apps.zip We put the COVID-19 themed apps Apk samples in zip files . In order to reduce the size of a single file, we divide the sample into multiple zip files for storage. And the APK file name after the file SHA256. If your papers or articles use our dataset, please use the following bibtex reference to cite our paper: https://arxiv.org/abs/2005.14619 (Accepted to Empirical Software Engineering) @misc{wang2021virus, title={Beyond the Virus: A First Look at Coronavirus-themed Mobile Malware}, author={Liu Wang and Ren He and Haoyu Wang and Pengcheng Xia and Yuanchun Li and Lei Wu and Yajin Zhou and Xiapu Luo and Yulei Sui and Yao Guo and Guoai Xu}, year={2021}, eprint={2005.14619}, archivePrefix={arXiv}, primaryClass={cs.CR} }
Rico is a public UI corpus with 72K Android UI screens mined from 9.7K Android apps (Deka et al., 2017). Each screen in Rico comes with a screenshot image and a view hierarchy of a collection of UI objects. Authors manually removed screens whose view hierarchies do not match their screenshots by asking annotators to visually verify whether the bounding boxes of view hierarchy leaves match each UI object on the corresponding screenshot image. This filtering results in 25K unique screens.
In total, RICOSCA contains 295,476 single-step synthetic commands for operating 177,962 different target objects across 25,677 Android screens.
We built a crawler to collect data from the Google Play store including the application's metadata and APK files. The manifest files were extracted from the APK files and then processed to extract the features. The data set is composed of 870,515 records/apps, and for each app we produced 48 features. The data set was used to built and test two bootstrap aggregating of multiple XGBoost machine learning classifiers. The dataset were collected between April 2017 and November 2018. We then checked the status of these applications on three different occasions; December 2018, February 2019, and May-June 2019.