MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.
User & Session Metadata
User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details
Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale
Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties
Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions
Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:
Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.
Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).
Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a synthetic representation of user behavior on a fictional dating app. It contains 50,000 records with 19 features capturing demographic details, app usage patterns, swipe tendencies, and match outcomes. The data was generated programmatically to simulate realistic user interactions, making it ideal for exploratory data analysis (EDA), machine learning modeling (e.g., predicting match outcomes), or studying user behavior trends in online dating platforms.
Key features include gender, sexual orientation, location type, income bracket, education level, user interests, app usage time, swipe ratios, likes received, mutual matches, and match outcomes (e.g., "Mutual Match," "Ghosted," "Catfished"). The dataset is designed to be diverse and balanced, with categorical, numerical, and labeled variables for various analytical purposes.
This dataset can be used for:
Exploratory Data Analysis (EDA): Investigate correlations between demographics, app usage, and match success. Machine Learning: Build models to predict match outcomes or user engagement levels. Social Studies: Analyze trends in dating app behavior across different demographics. Feature Engineering Practice: Experiment with transforming categorical and numerical data.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This dataset offers a focused and invaluable window into user perceptions and experiences with applications listed on the Apple App Store. It is a vital resource for app developers, product managers, market analysts, and anyone seeking to understand the direct voice of the customer in the dynamic mobile app ecosystem.
Dataset Specifications:
Last crawled:
(This field is blank in your provided info, which means its recency is currently unknown. If this were a real product, specifying this would be critical for its value proposition.)Richness of Detail (11 Comprehensive Fields):
Each record in this dataset provides a detailed breakdown of a single App Store review, enabling multi-dimensional analysis:
Review Content:
review
: The full text of the user's written feedback, crucial for Natural Language Processing (NLP) to extract themes, sentiment, and common keywords.title
: The title given to the review by the user, often summarizing their main point.isEdited
: A boolean flag indicating whether the review has been edited by the user since its initial submission. This can be important for tracking evolving sentiment or understanding user behavior.Reviewer & Rating Information:
username
: The public username of the reviewer, allowing for analysis of engagement patterns from specific users (though not personally identifiable).rating
: The star rating (typically 1-5) given by the user, providing a quantifiable measure of satisfaction.App & Origin Context:
app_name
: The name of the application being reviewed.app_id
: A unique identifier for the application within the App Store, enabling direct linking to app details or other datasets.country
: The country of the App Store storefront where the review was left, allowing for geographic segmentation of feedback.Metadata & Timestamps:
_id
: A unique identifier for the specific review record in the dataset.crawled_at
: The timestamp indicating when this particular review record was collected by the data provider (Crawl Feeds).date
: The original date the review was posted by the user on the App Store.Expanded Use Cases & Analytical Applications:
This dataset is a goldmine for understanding what users truly think and feel about mobile applications. Here's how it can be leveraged:
Product Development & Improvement:
review
text to identify recurring technical issues, crashes, or bugs, allowing developers to prioritize fixes based on user impact.review
text to inform future product roadmap decisions and develop features users actively desire.review
field.rating
and sentiment
after new app updates to assess the effectiveness of bug fixes or new features.Market Research & Competitive Intelligence:
Marketing & App Store Optimization (ASO):
review
and title
fields to gauge overall user satisfaction, pinpoint specific positive and negative aspects, and track sentiment shifts over time.rating
trends and identify critical reviews quickly to facilitate timely responses and proactive customer engagement.Academic & Data Science Research:
review
and title
fields are excellent for training and testing NLP models for sentiment analysis, topic modeling, named entity recognition, and text summarization.rating
distribution, isEdited
status, and date
to understand user engagement and feedback cycles.country
-specific reviews to understand regional differences in app perception, feature preferences, or cultural nuances in feedback.This App Store Reviews dataset provides a direct, unfiltered conduit to understanding user needs and ultimately driving better app performance and greater user satisfaction. Its structured format and granular detail make it an indispensable asset for data-driven decision-making in the mobile app industry.
This dataset contains 54,987 UI screenshots and the metadata from 7,748 Android applications belonging to 25 application categories
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises 26,261 user reviews of the BCA Mobile app collected from the Google Play Store between June 1, 2023, and May 31, 2024. Each review includes the user's name, the rating they provided (ranging from 1 to 5 stars), the timestamp of when the review was created, and the text content of the review. The dataset is in Indonesian and focuses on feedback from users in Indonesia. This data can be used to perform sentiment analysis, understand user experiences, identify common issues, and assess the overall performance of the BCA Mobile app during the specified timeframe. The reviews are sorted based on the newest first, providing the latest feedback at the top.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We surveyed 10,208 people from more than 15 countries on their mobile app usage behavior. The countries include USA, China, Japan, Germany, France, Brazil, UK, Italy, Russia, India, Canada, Spain, Australia, Mexico, and South Korea. We asked respondents about: (1) their mobile app user behavior in terms of mobile app usage, including the app stores they use, what triggers them to look for apps, why they download apps, why they abandon apps, and the types of apps they download. (2) their demographics including gender, age, marital status, nationality, country of residence, first language, ethnicity, education level, occupation, and household income (3) their personality using the Big-Five personality traits This dataset contains the results of the survey.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A dataset consisting of 751,500 English app reviews of 12 online shopping apps. The dataset was scraped from the internet using a python script. This ShoppingAppReviews dataset contains app reviews of the 12 most popular online shopping android apps: Alibaba, Aliexpress, Amazon, Daraz, eBay, Flipcart, Lazada, Meesho, Myntra, Shein, Snapdeal and Walmart. Each review entry contains many metadata like review score, thumbsupcount, review posting time, reply content etc. The dataset is organized in a zip file, under which there are 12 json files and 12 csv files for 12 online shopping apps. This dataset can be used to obtain valuable information about customers' feedback regarding their user experience of these financially important apps.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 6 columns and 10k rows about the demographics of the users of an app. UID - User ID, unique identifier for every app user. reg_date - Date that each user registered. device - Operating system of the user. Gender - Gender of the user Country - Country where the user downloaded the app. Age - Age of the user.
https://brightdata.com/licensehttps://brightdata.com/license
This dataset encompasses a wide-ranging collection of Google Play applications, providing a holistic view of the diverse ecosystem within the platform. It includes information on various attributes such as the title, developer, monetization features, images, app descriptions, data safety measures, user ratings, number of reviews, star rating distributions, user feedback, recent updates, related applications by the same developer, content ratings, estimated downloads, and timestamps. By aggregating this data, the dataset offers researchers, developers, and analysts an extensive resource to explore and analyze trends, patterns, and dynamics within the Google Play Store. Researchers can utilize this dataset to conduct comprehensive studies on user behavior, market trends, and the impact of various factors on app success. Developers can leverage the insights derived from this dataset to inform their app development strategies, improve user engagement, and optimize monetization techniques. Analysts can employ the dataset to identify emerging trends, assess the performance of different categories of applications, and gain valuable insights into consumer preferences. Overall, this dataset serves as a valuable tool for understanding the broader landscape of the Google Play Store and unlocking actionable insights for various stakeholders in the mobile app industry.
Apple App Store dataset to explore detailed information on app popularity, user feedback, and monetization features. Popular use cases include market trend analysis, app performance evaluation, and consumer behavior insights in the mobile app ecosystem.
Use our Apple App Store dataset to gain comprehensive insights into the mobile app ecosystem, including app popularity, user ratings, monetization features, and user feedback. This dataset covers various aspects of apps, such as descriptions, categories, and download metrics, offering a full picture of app performance and trends.
Tailored for marketers, developers, and industry analysts, this dataset allows you to track market trends, identify emerging apps, and refine promotional strategies. Whether you're optimizing app development, analyzing competitive landscapes, or forecasting market opportunities, the Apple App Store dataset is an essential tool for making data-driven decisions in the ever-evolving mobile app industry.
This dataset is versatile and can be used for various applications: - Market Analysis: Analyze app pricing strategies, monetization features, and category distribution to understand market trends and opportunities in the App Store. This can help developers and businesses make informed decisions about their app development and pricing strategies. - User Experience Research: Study the relationship between app ratings, number of reviews, and app features to understand what drives user satisfaction. The detailed review data and ratings can provide insights into user preferences and pain points. - Competitive Intelligence: Track and analyze apps within specific categories, comparing features, pricing, and user engagement metrics to identify successful patterns and market gaps. Particularly useful for developers planning new apps or improving existing ones. - Performance Prediction: Build predictive models using features like app size, category, pricing, and language support to forecast potential app success metrics. This can help in making data-driven decisions during app development. - Localization Strategy: Analyze the languages supported and regional performance to inform decisions about app localization and international market expansion.
CUSTOM Please review the respective licenses below: 1. Data Provider's License - Bright Data Master Service Agreement
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Automated Insights Dataset (AID) brings metadata from the 200 most downloaded free apps from each of the 32 categories on the Google Play Store, totaling 6400 apps, with information that goes beyond that presented by app stores, also bringing metadata from AppBrain. The User Interface Depth Dataset (UID) brings a high-quality sampling of the AID, and delves into the identification of 7540 components of 50 component types and the capture of 1948 screenshots of the interface of 400 apps. The component set was based on components of Google Material Design and Android Studio.
The datasets can be viewed in the spreadsheets named "Automated Insights Dataset (AID).xlsx" and "User Interface Depth Dataset (UID).xlsx".
The "UID - Screenshots.zip" file contains screenshots of the apps present in the UID, organized in folders by app IDs.
The "Source code of the developed tools.zip" file contains Python codes and complementary files used to collect the datasets.
The "Discarded apps.zip" file contains the apps discarded in the analysis, it presents screenshots of some apps, collected elements and the reasons that led to these apps being discarded.
The "Data explanation.zip" file contains graphical representations of the UID components and textual representations of each data present in the UID and AID, allowing a better understanding of the criteria used.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We conducted a study to investigate how users' app-specific knowledge influences their preferences for the level of detail and format of software explanations. Participants completed an online survey, which included questions designed to assess both subjective and objective knowledge of software in two app categories: browser and office applications. Objective app-specific knowledge was measured through multiple-choice questions with definitive answers, while subjective familiarity was self-assessed by participants. Preferences for explanation formats (e.g., text, video) and detail levels were also collected.
The dataset includes raw survey responses, processed objective and subjective knowledge scores, and a comprehensive correlation analysis between knowledge levels, demographics, and explanation preferences. We also provide SQL scripts for extracting data on specific variables (e.g., demographics, knowledge scores) to facilitate targeted analyses. The repository is structured to support replication or extension of the study, with organized directories for Excel results, SQL extraction scripts, and raw survey data, including the original LimeSurvey questionnaire in PDF format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset offers thorough app metadata from the Google Play Store and a sentiment analysis of user reviews for the app. The first dataset (App_Sentiment_Analysis.csv) provides insights into user views and experiences via translated review texts, sentiment classifications, and numerical ratings for sentiment polarity and subjectivity. The second dataset (Review.csv) covers various program parameters, including ratings, review counts, sizes, installation counts, content ratings, genres, and more. When combined, these datasets allow for an in-depth examination of user reviews and app performance, which supports tactics for app suggestion and enhancement. And also used app logo images using recommendations in this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The peer-reviewed paper of AWARE dataset is published in ASEW 2021, and can be accessed through: http://doi.org/10.1109/ASEW52652.2021.00049. Kindly cite this paper when using AWARE dataset.
Aspect-Based Sentiment Analysis (ABSA) aims to identify the opinion (sentiment) with respect to a specific aspect. Since there is a lack of smartphone apps reviews dataset that is annotated to support the ABSA task, we present AWARE: ABSA Warehouse of Apps REviews.
AWARE contains apps reviews from three different domains (Productivity, Social Networking, and Games), as each domain has its distinct functionalities and audience. Each sentence is annotated with three labels, as follows:
Aspect Term: a term that exists in the sentence and describes an aspect of the app that is expressed by the sentiment. A term value of “N/A” means that the term is not explicitly mentioned in the sentence.
Aspect Category: one of the pre-defined set of domain-specific categories that represent an aspect of the app (e.g., security, usability, etc.).
Sentiment: positive or negative.
Note: games domain does not contain aspect terms.
We provide a comprehensive dataset of 11323 sentences from the three domains, where each sentence is additionally annotated with a Boolean value indicating whether the sentence expresses a positive/negative opinion. In addition, we provide three separate datasets, one for each domain, containing only sentences that express opinions. The file named “AWARE_metadata.csv” contains a description of the dataset’s columns.
How AWARE can be used?
We designed AWARE such that it can be used to serve various tasks. The tasks can be, but are not limited to:
Sentiment Analysis.
Aspect Term Extraction.
Aspect Category Classification.
Aspect Sentiment Analysis.
Explicit/Implicit Aspect Term Classification.
Opinion/Not-Opinion Classification.
Furthermore, researchers can experiment with and investigate the effects of different domains on users' feedback.
Austin Energy’s free, interactive web app allows customers to monitor their daily energy usage, view their bill history, and see a future forecast of their energy bill cost. They can also set alerts to warn them of an upcoming rate tier change. Austin Energy Web App users can also download Green Button Data which can help them better understand their energy usage and take action towards savings. This data set shows the number of web app users by month. Learn more about the app at http://powersaver.austinenergy.com/wps/portal/psp/residential/learn/free-home-energy-management-options/alerts-and-tips-help-you-manage-your-energy-costs
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
The Threads, an Instagram App Reviews dataset is a comprehensive collection of user reviews from the Threads mobile app on Google Play Store & App Store, capturing valuable insights and sentiments. The dataset enables the understanding of user satisfaction, evaluation of app performance, and identification of emerging patterns.
Scraping Threads App reviews on Google Play Store & App Store
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There has been an increased emphasis on plant-based foods and diets. Although mobile technology has the potential to be a convenient and innovative tool to help consumers adhere to dietary guidelines, little is known about the content and quality of free, popular mobile health (mHealth) plant-based diet apps. The objective of the study was to assess the content and quality of free, popular mHealth apps supporting plant-based diets for Canadians. Free mHealth apps with high user ratings, a high number of user ratings, available on both Apple App and GooglePlay stores, and primarily marketed to help users follow plant-based diet were included. Using pre-defined search terms, Apple App and GooglePlay App stores were searched on December 22, 2020; the top 100 returns for each search term were screened for eligibility. Included apps were downloaded and assessed for quality by three dietitians/nutrition research assistants using the Mobile App Rating Scale (MARS) and the App Quality Evaluation (AQEL) scale. Of the 998 apps screened, 16 apps (mean user ratings±SEM: 4.6±0.1) met the eligibility criteria, comprising 10 recipe managers and meal planners, 2 food scanners, 2 community builders, 1 restaurant identifier, and 1 sustainability assessor. All included apps targeted the general population and focused on changing behaviors using education (15 apps), skills training (9 apps), and/or goal setting (4 apps). Although MARS (scale: 1–5) revealed overall adequate app quality scores (3.8±0.1), domain-specific assessments revealed high functionality (4.0±0.1) and aesthetic (4.0±0.2), but low credibility scores (2.4±0.1). The AQEL (scale: 0–10) revealed overall low score in support of knowledge acquisition (4.5±0.4) and adequate scores in other nutrition-focused domains (6.1–7.6). Despite a variety of free plant-based apps available with different focuses to help Canadians follow plant-based diets, our findings suggest a need for increased credibility and additional resources to complement the low support of knowledge acquisition among currently available plant-based apps. This research received no specific grant from any funding agency.
As COVID-19 continues to spread across the world, a growing number of malicious campaigns are exploiting the pandemic. It is reported that COVID-19 is being used in a variety of online malicious activities, including Email scam, ransomware and malicious domains. As the number of the afflicted cases continue to surge, malicious campaigns that use coronavirus as a lure are increasing. Malicious developers take advantage of this opportunity to lure mobile users to download and install malicious apps.
However, besides a few media reports, the coronavirus-themed mobile malware has not been well studied. Our community lacks of the comprehensive understanding of the landscape of the coronavirus-themed mobile malware, and no accessible dataset could be used by our researchers to boost COVID-19 related cybersecurity studies.
We make efforts to create a daily growing COVID-19 related mobile app dataset. By the time of mid-November, we have curated a dataset of 4,322 COVID-19 themed apps, and 611 of them are considered to be malicious. The number is growing daily and our dataset will update weekly. For more details, please visit https://covid19apps.github.io
This dataset includes the following files:
(1) covid19apps.xlsx
In this file, we list all the COVID-19 themed apps information, including apk file hashes, released date, package name, AV-Rank, etc.
(2)covid19apps.zip
We put the COVID-19 themed apps Apk samples in zip files . In order to reduce the size of a single file, we divide the sample into multiple zip files for storage. And the APK file name after the file SHA256.
If your papers or articles use our dataset, please use the following bibtex reference to cite our paper: https://arxiv.org/abs/2005.14619
(Accepted to Empirical Software Engineering)
@misc{wang2021virus, title={Beyond the Virus: A First Look at Coronavirus-themed Mobile Malware}, author={Liu Wang and Ren He and Haoyu Wang and Pengcheng Xia and Yuanchun Li and Lei Wu and Yajin Zhou and Xiapu Luo and Yulei Sui and Yao Guo and Guoai Xu}, year={2021}, eprint={2005.14619}, archivePrefix={arXiv}, primaryClass={cs.CR} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes network traffic data from more than 50 Android applications across 5 different scenarios. The applications are consistent in all scenarios, but other factors like location, device, and user vary (see Table 2 in the paper). The current repository pertains to Scenario B. Within the repository, for each application, there is a compressed file containing the relevant PCAP files. The PCAP files follow the naming convention: {Application Name}{Scenario ID}{#Trace}_Final.pcap.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises user feedback data collected from 15 globally acclaimed mobile applications, spanning diverse categories. The included applications are among the most downloaded worldwide, providing a rich and varied source for analysis. The dataset is particularly suitable for Natural Language Processing (NLP) applications, such as text classification and topic modeling. List of Included Applications:
TikTok Instagram Facebook WhatsApp Telegram Zoom Snapchat Facebook Messenger Capcut Spotify YouTube HBO Max Cash App Subway Surfers Roblox Data Columns and Descriptions: Data Columns and Descriptions:
review_id: Unique identifiers for each user feedback/application review. content: User-generated feedback/review in text format. score: Rating or star given by the user. TU_count: Number of likes/thumbs up (TU) received for the review. app_id: Unique identifier for each application. app_name: Name of the application. RC_ver: Version of the app when the review was created (RC). Terms of Use: This dataset is open access for scientific research and non-commercial purposes. Users are required to acknowledge the authors' work and, in the case of scientific publication, cite the most appropriate reference: M. H. Asnawi, A. A. Pravitasari, T. Herawan, and T. Hendrawati, "The Combination of Contextualized Topic Model and MPNet for User Feedback Topic Modeling," in IEEE Access, vol. 11, pp. 130272-130286, 2023, doi: 10.1109/ACCESS.2023.3332644.
Researchers and analysts are encouraged to explore this dataset for insights into user sentiments, preferences, and trends across these top mobile applications. If you have any questions or need further information, feel free to contact the dataset authors.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.
User & Session Metadata
User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details
Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale
Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties
Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions
Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:
Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.
Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).
Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.