MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.
User & Session Metadata
User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details
Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale
Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties
Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions
Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:
Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.
Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).
Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.
The global number of KakaoTalk users in was forecast to decrease between 2024 and 2028 by in total 0.7 million users. This overall decrease does not happen continuously, notably not in 2026 and 2027. The KakaoTalk user base is estimated to amount to 48.7 million users in 2028. Notably, the number of KakaoTalk users of was continuously increasing over the past years.User figures, here concerning the platform kakaoTalk, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset consists of daily data collected from users of various social media apps. It includes information such as the number of active users, the frequency of user interactions, and the time spent on each app daily. The data is categorized by individual apps, allowing insights into user behavior and engagement patterns on a per-day basis.
The number of WhatsApp users in North America was forecast to continuously increase between 2024 and 2029 by in total 7.6 million users (+4.97 percent). After the ninth consecutive increasing year, the WhatsApp user base is estimated to reach 160.51 million users and therefore a new peak in 2029. Notably, the number of WhatsApp users of was continuously increasing over the past years.User figures, shown here regarding the platform whatsapp, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of WhatsApp users in countries like Worldwide and Africa.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains four datasets about the number of active users of selected mobile apps purchased from Selectivv company (https://selectivv.com/). Details regarding the data may be found below:
How data was collected: Selectivv uses programmatic advertisements systems that collect information on about 24 mln smartphone users in Poland
Apps:
Transportation: Uber, Bolt Driver, FREE NOW, iTaxi,
Delivery: Glover, Takeaway, Bolt Courier, Wolt;
Unit: an active user of a given app. Active = used given app at least 1 minute in a given period (e.g. 1 unit during whole month, half-year).
Period: 2018-2018; monthly and half-year data
Spatial aggregation: country level, city level, functional area level, voivodeship level. Functional area is defined as here https://stat.gov.pl/en/regional-statistics/regional-surveys/urban-audit/larger-urban-zones-luz/
Activity time: measured by activity time of given app (in hours; average and standard deviation)
Datasets:
gig-table1-monthly-counts-stats.csv -- the monthly number of active users;
gig-table2-halfyear-demo-stats.csv -- the half-year number of active users by socio-demographic variables;
gig-table3-halfyear-region-stats.csv -- the half-year number of active users by spatial aggregation;
gig-table4-halfyear-activity-stats.csv -- the half-year activity time by working week, weekend, day (8-18) and night (18-8).
Detailed description:
Structure:
month - YYYY-MM-DD -- we set all dates to 15th of given month but actually the data is about the whole month (active users in whole period); 2018-01-15 to 2021-12-15
app -- app name (Uber, Bolt Driver, FREE NOW, iTaxi, Glover, Takeaway, Bolt Courier, Wolt)
number_of_users -- the number of active users
category -- Transportation, Deliver
Structure:
gender -- men, women
age -- 18-30, 31-50, 51-64
country -- Poland, Ukraine, Other
period -- 2018.1, 2018.2, 2019.1, 2019.2, 2020.1, 2021.2
apps -- app name (Uber, Bolt Driver, FREE NOW, iTaxi, Glover, Takeaway, Bolt Courier, Wolt)
number_of_users -- the number of active users
students -- the share of students within a given row
parents_of_children_0_4_years -- the share of parents of 0-4 years children in a given row
parents_of_children_5_10_years -- the share of parents of 5-10 years children in a given row
women_planning_a_baby -- the share of women planing a baby in a given row
standard -- the share of standard smartphones in a given row
premium_i_phone -- the share of iPhone smartphones in a given row
other_premium -- the share of other premium smartphones in a given row
category -- Transportation, Delivery
Structure:
group -- Voivodeship, Functional Area, Cities
period -- 2018.1, 2018.2, 2019.1, 2019.2, 2020.1, 2021.2
region_name:
Cities -- Białystok, Bydgoszcz, Gdańsk, Gdynia, Gorzów Wielkopolski, Katowice, Kielce, Kraków, Łódź, Lublin, Olsztyn, Opole, Poznań, Rzeszów, Sopot, Szczecin, Toruń, Warszawa, Wrocław, Zielona Góra
Functional Area -- Functional area - Białystok, Functional area - Bydgoszcz, Functional area - Gorzów Wielkopolski, Functional area - GZM, Functional area - GZM2, Functional area - Kielce, Functional area - Kraków, Functional area - Łódź, Functional area - Lublin, Functional area - Olsztyn, Functional area - Opole, Functional area - Poznań, Functional area - Rzeszów, Functional area - Szczecin, Functional area - Toruń, Functional area - Trójmiasto, Functional area - Warszawa, Functional area - Wrocław, Functional area - Zielona Góra
Voivodeship -- dolnośląskie, kujawsko-pomorskie, łódzkie, lubelskie, lubuskie, małopolskie, mazowieckie, opolskie, podkarpackie, podlaskie, pomorskie, śląskie, świętokrzyskie, warmińsko-mazurskie, wielkopolskie, zachodniopomorskie
apps -- app name (Uber, Bolt Driver, FREE NOW, iTaxi, Glover, Takeaway, Bolt Courier, Wolt)
number_of_users -- the number of active users
category -- Transportation, Delivery
Please note that:
the number of active users in a given functional area = number of active users in a city and a functional area of this city
the number of active users in voivodeship = number of active users in a city, its functional area and the rest of the voivodeship where this city and functional area is located
More details here: https://stat.gov.pl/en/regional-statistics/regional-surveys/urban-audit/larger-urban-zones-luz/
Structure:
period -- 2018.1, 2018.2, 2019.1, 2019.2, 2020.1, 2021.2
apps -- app name (Uber, Bolt Driver, FREE NOW, iTaxi, Glover, Takeaway, Bolt Courier, Wolt)
day -- Mondays-Thursdays, Fridays-Sundays
hour -- day (8-18), night (18-8)
activity_time -- in hours
statistic -- Average, Std.Dev. (standard deviation)
category -- Transportation, Delivery
The number of WhatsApp users in Europe was forecast to continuously increase between 2024 and 2029 by in total 2.6 million users (+2.17 percent). After the ninth consecutive increasing year, the WhatsApp user base is estimated to reach 122.38 million users and therefore a new peak in 2029. Notably, the number of WhatsApp users of was continuously increasing over the past years.User figures, shown here regarding the platform whatsapp, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of WhatsApp users in countries like Africa and South America.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes network traffic data from more than 50 Android applications across 5 different scenarios. The applications are consistent in all scenarios, but other factors like location, device, and user vary (see Table 2 in the paper). The current repository pertains to Scenario A. Within the repository, for each application, there is a compressed file containing the relevant PCAP files. The PCAP files follow the naming convention: {Application Name}{Scenario ID}{#Trace}_Final.pcap.
TagX Web Browsing Clickstream Data: Unveiling Digital Behavior Across North America and EU Unique Insights into Online User Behavior TagX Web Browsing clickstream Data offers an unparalleled window into the digital lives of 1 million users across North America and the European Union. This comprehensive dataset stands out in the market due to its breadth, depth, and stringent compliance with data protection regulations. What Makes Our Data Unique?
Extensive Geographic Coverage: Spanning two major markets, our data provides a holistic view of web browsing patterns in developed economies. Large User Base: With 300K active users, our dataset offers statistically significant insights across various demographics and user segments. GDPR and CCPA Compliance: We prioritize user privacy and data protection, ensuring that our data collection and processing methods adhere to the strictest regulatory standards. Real-time Updates: Our clickstream data is continuously refreshed, providing up-to-the-minute insights into evolving online trends and user behaviors. Granular Data Points: We capture a wide array of metrics, including time spent on websites, click patterns, search queries, and user journey flows.
Data Sourcing: Ethical and Transparent Our web browsing clickstream data is sourced through a network of partnered websites and applications. Users explicitly opt-in to data collection, ensuring transparency and consent. We employ advanced anonymization techniques to protect individual privacy while maintaining the integrity and value of the aggregated data. Key aspects of our data sourcing process include:
Voluntary user participation through clear opt-in mechanisms Regular audits of data collection methods to ensure ongoing compliance Collaboration with privacy experts to implement best practices in data anonymization Continuous monitoring of regulatory landscapes to adapt our processes as needed
Primary Use Cases and Verticals TagX Web Browsing clickstream Data serves a multitude of industries and use cases, including but not limited to:
Digital Marketing and Advertising:
Audience segmentation and targeting Campaign performance optimization Competitor analysis and benchmarking
E-commerce and Retail:
Customer journey mapping Product recommendation enhancements Cart abandonment analysis
Media and Entertainment:
Content consumption trends Audience engagement metrics Cross-platform user behavior analysis
Financial Services:
Risk assessment based on online behavior Fraud detection through anomaly identification Investment trend analysis
Technology and Software:
User experience optimization Feature adoption tracking Competitive intelligence
Market Research and Consulting:
Consumer behavior studies Industry trend analysis Digital transformation strategies
Integration with Broader Data Offering TagX Web Browsing clickstream Data is a cornerstone of our comprehensive digital intelligence suite. It seamlessly integrates with our other data products to provide a 360-degree view of online user behavior:
Social Media Engagement Data: Combine clickstream insights with social media interactions for a holistic understanding of digital footprints. Mobile App Usage Data: Cross-reference web browsing patterns with mobile app usage to map the complete digital journey. Purchase Intent Signals: Enrich clickstream data with purchase intent indicators to power predictive analytics and targeted marketing efforts. Demographic Overlays: Enhance web browsing data with demographic information for more precise audience segmentation and targeting.
By leveraging these complementary datasets, businesses can unlock deeper insights and drive more impactful strategies across their digital initiatives. Data Quality and Scale We pride ourselves on delivering high-quality, reliable data at scale:
Rigorous Data Cleaning: Advanced algorithms filter out bot traffic, VPNs, and other non-human interactions. Regular Quality Checks: Our data science team conducts ongoing audits to ensure data accuracy and consistency. Scalable Infrastructure: Our robust data processing pipeline can handle billions of daily events, ensuring comprehensive coverage. Historical Data Availability: Access up to 24 months of historical data for trend analysis and longitudinal studies. Customizable Data Feeds: Tailor the data delivery to your specific needs, from raw clickstream events to aggregated insights.
Empowering Data-Driven Decision Making In today's digital-first world, understanding online user behavior is crucial for businesses across all sectors. TagX Web Browsing clickstream Data empowers organizations to make informed decisions, optimize their digital strategies, and stay ahead of the competition. Whether you're a marketer looking to refine your targeting, a product manager seeking to enhance user experience, or a researcher exploring digital trends, our cli...
The S3 dataset contains the behavior (sensors, statistics of applications, and voice) of 21 volunteers interacting with their smartphones for more than 60 days. The type of users is diverse, males and females in the age range from 18 until 70 have been considered in the dataset generation. The wide range of age is a key aspect, due to the impact of age in terms of smartphone usage. To generate the dataset the volunteers installed a prototype of the smartphone application in on their Android mobile phones.
All attributes of the different kinds of data are writed in a vector. The dataset contains the fellow vectors:
Sensors:
This type of vector contains data belonging to smartphone sensors (accelerometer and gyroscope) that has been acquired in a given windows of time. Each vector is obtained every 20 seconds, and the monitored features are:- Average of accelerometer and gyroscope values.- Maximum and minimum of accelerometer and gyroscope values.- Variance of accelerometer and gyroscope values.- Peak-to-peak (max-min) of X, Y, Z coordinates.- Magnitude for gyroscope and accelerometer.
Statistics:
These vectors contain data about the different applications used by the user recently. Each vector of statistics is calculated every 60 seconds and contains : - Foreground application counters (number of different and total apps) for the last minute and the last day.- Most common app ID and the number of usages in the last minute and the last day. - ID of the currently active app. - ID of the last active app prior to the current one.- ID of the application most frequently utilized prior to the current application. - Bytes transmitted and received through the network interfaces.
Voice:
This kind of vector is generated when the microphone is active in a call o voice note. The speaker vector is an embedding, extracted from the audio, and it contains information about the user's identity. This vector, is usually named "x-vector" in the Speaker Recognition field, and it is calculated following the steps detailed in "egs/sitw/v2" for the Kaldi library, with the models available for the extraction of the embedding.
A summary of the details of the collected database.
- Users: 21 - Sensors vectors: 417.128 - Statistics app's usage vectors: 151.034 - Speaker vectors: 2.720 - Call recordings: 629 - Voice messages: 2.091
As of April 2024, almost 32 percent of global Instagram audiences were aged between 18 and 24 years, and 30.6 percent of users were aged between 25 and 34 years. Overall, 16 percent of users belonged to the 35 to 44 year age group.
Instagram users
With roughly one billion monthly active users, Instagram belongs to the most popular social networks worldwide. The social photo sharing app is especially popular in India and in the United States, which have respectively 362.9 million and 169.7 million Instagram users each.
Instagram features
One of the most popular features of Instagram is Stories. Users can post photos and videos to their Stories stream and the content is live for others to view for 24 hours before it disappears. In January 2019, the company reported that there were 500 million daily active Instagram Stories users. Instagram Stories directly competes with Snapchat, another photo sharing app that initially became famous due to it’s “vanishing photos” feature.
As of the second quarter of 2021, Snapchat had 293 million daily active users.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Apple App Store Key StatisticsApps & Games in the Apple App StoreApps in the Apple App StoreGames in the Apple App StoreMost Popular Apple App Store CategoriesPaid vs Free Apps in Apple App...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Get the count of unique users that connected to Exchange Online using any email app.
Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.
The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
How popular is Instagram?
Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
Who uses Instagram?
Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
Celebrity influencers on Instagram
Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The study aimed to investigate the long term impact of experiences in user engagement of a food reporting mobile game app. The study recruited 10 participants, with 8 being able to complete the study. The period consider at least 6 weeks of continuous use of the DigestInn application. A one year licence of the DigestInn mobile app was given for free to each participant. A mixed dataset was collected:
Daily mood reporting: Experience Sampling Method [1] was used to sample daily participants' mood towards their experience using the application. Whatsapp [2] and the visual Pick-A-Mood tool [3] were used to prompt participants daily.
Weekly user engagement reporting: a user engagement scale was used and adjusted for this purpose [4]. The survey was implemented in TypeForm [5]. The prompt/reminder was done through whatsapp via a visual summary of the mood reporting, based on Daily reconstruction method [6]
6 weeks interviews: individual interviews were conducted in person and via Skype. Focus group were conducted in the establishment of Arhnem-Nijmegen Applied Science University. In all cases visual prompts of food and mood reports were presented as probes [6]
Raw data was processed for analysis.
Coded transcripts: two students assistant and a code manager processed the transcripts using the software Atlas.ti [7] version 8.4.4. A coding scheme was initially developed, code manager trained the student assistant till a higher than .9 interrelated coder was achieved [8]
Parsed json files: a json file containing the complete dataset of the complete study period was parsed to extract each participants food reports during. First the file was split in 8 files (one for each participant). A python program and a bash script were developed in Mac OSX to parse the json files into .csv files. In excel, .csv files were parsed by means of two Visual Basic macros to obtain a tabular view of the food reports per participant.
[1] Larson, R., & Csikszentmihalyi, M. (2014). The experience sampling method. In Flow and the foundations of positive psychology (pp. 21-34). Springer, Dordrecht. [2] https://www.whatsapp.com [3] Desmet, P., Vastenburg, M., and Romero, N. (2016) Mood measurement with Pick-A-Mood: review of current methods and design of a pictorial self-report scale. Journal Design Research, 14 (3), pp. 241-279 [4] O’Brien, H. L., Cairns, P., & Hall, M. (2018). A practical approach to measuring user engagement with the refined user engagement scale (UES) and new UES short form. International Journal of Human-Computer Studies, 112, 28-39 [5] https://www.typeform.com/ [6] Kahneman, D., Krueger, A. B., Schkade, D. A., Schwarz, N., & Stone, A. A. (2004). A survey method for characterizing daily life experience: The day reconstruction method. Science, 306(5702), 1776-1780. [7] Atlas.ti [8] Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook. sage.
CAL FIRE's Fire and Resource Assessment Program (FRAP) annually maintains and distributes an historical fire perimeter dataset from across public and private lands in California. The GIS data is developed with the cooperation of the United States Forest Service Region 5, the Bureau of Land Management, the National Park Service and the United States Fish and Wildlife Service and is released in the spring with added data from the previous calendar year. Although the dataset represents the most complete digital record of fire perimeters in California, it is still incomplete, and users should be cautious when drawing conclusions based on the data. This app contains three pages of maps and documentation of the historical fire perimeter metadata: Historical Fire Perimeters: The landing page highlights the recent large fires (≥5,000 acres) on a backdrop of all of the dataset's documented fire perimeters dating back to 1878. This map includes perimeters symbolized by decade, county boundaries, California Vegetation, and NAIP imagery back to 2005. This page provides users the ability to add their own data or filter the fire perimeter data. It cleanly lists fire perimeters shown on the map with their name, year, and GIS calculated acreage. The user can navigate to the CAL FIRE current incident webpage or provide comments to the dataset's steward. Times Burned: The second page provides a map showing an analysis performed annually on the fire perimeter dataset to show case burn frequency from 1950 to present for fires greater than one acre. Fire Across Time: This third page provides a time enabled layer of the fire perimeter dataset, featuring a time slider to allow users to view the perimeter dataset across time. The final page provides the user with the dataset's metadata, including its most current data dictionary. For any questions, please contact the data steward:Kim Wallin, GIS SpecialistCAL FIRE, Fire & Resource Assessment Program (FRAP)kimberly.wallin@fire.ca.gov
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Overview Spotify is one of the largest music streaming service providers, with over 422 million monthly active users, including 182 million paying subscribers, as of March 2022. Some of them don't hesitate to share their experience using this application along with the given rating to denote how satisfied they are with the Application
The way data was collected Scraping Spotify reviews on Google Play Store
Ideas for using this dataset Sentiment analysis What makes the application receive 1-star and 5-star
Original Data Source: Spotify App Reviews
The confluence of Search and Recommendation (S&R) services is a vital aspect of online content platforms like Kuaishou and TikTok. The integration of S&R modeling is a highly intuitive approach adopted by industry practitioners. However, there is a noticeable lack of research conducted in this area within the academia, primarily due to the absence of publicly available datasets. Consequently, a substantial gap has emerged between academia and industry regarding research endeavors in this field. To bridge this gap, we introduce the first large-scale, real-world dataset KuaiSAR of integrated Search And Recommendation behaviors collected from Kuaishou, a leading short-video app in China with over 300 million daily active users. Previous research in this field has predominantly employed publicly available datasets that are semi-synthetic and simulated, with artificially fabricated search behaviors. Distinct from previous datasets, KuaiSAR records genuine user behaviors, the occurrence of each interaction within either search or recommendation service, and the users’ transitions between the two services. This work aids in joint modeling of S&R, and the utilization of search data for recommenders (and recommendation data for search engines). Additionally, due to the diverse feedback labels of user-video interactions, KuaiSAR also supports a wide range of other tasks, including intent recommendation, multi-task learning, and long sequential multi-behavior modeling etc. We believe this dataset will facilitate innovative research and enrich our understanding of S&R services integration in real-world applications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Playstore Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/madhav000/playstore-analysis on 30 September 2021.
--- Dataset description provided by original source is as follows ---
Google Play Store team had launched a new feature wherein, certain apps that are promising, are boosted in visibility. The boost will manifest in multiple ways including higher priority in recommendations sections (“Similar apps”, “You might also like”, “New and updated games”). These will also get a boost in search results visibility. This feature will help bring more attention to newer apps that have the potential.
The problem is to identify the apps that are going to be good for Google to promote. App ratings, which are provided by the customers, is always a great indicator of the goodness of the app. The problem reduces to: predict which apps will have high ratings.
Google Play Store team is about to launch a new feature wherein, certain apps that are promising, are boosted in visibility. The boost will manifest in multiple ways including higher priority in recommendations sections (“Similar apps”, “You might also like”, “New and updated games”). These will also get a boost in search results visibility. This feature will help bring more attention to newer apps that have the potential.
Dataset: Google Play Store data (“googleplaystore.csv”)
Fields in the data: App: Application name Category: Category to which the app belongs Rating: Overall user rating of the app Reviews: Number of user reviews for the app Size: Size of the app Installs: Number of user downloads/installs for the app Type: Paid or Free Price: Price of the app Content Rating: Age group the app is targeted at - Children / Mature 21+ / Adult Genres: An app can belong to multiple genres (apart from its main category). For example, a musical family game will belong to Music, Game, Family genres. Last Updated: Date when the app was last updated on Play Store Current Ver: Current version of the app available on Play Store Android Ver: Minimum required Android version
--- Original source retains full ownership of the source dataset ---
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
AbstractThis dataset was collected from first-generation immigrants between 2022 and 2023. Over a 28-day period, 39 participants aged 18 to 65, fluent in English and experiencing loneliness (UCLA Loneliness Scale score ≥ 28) contributed to the study. Data collection utilized Samsung Watch Active 2, Oura Ring, AWARE, and Centralive smartphone application. This dataset contains raw data from photoplethysmogram (PPG), inertial measurement unit (IMU) readings, air pressure, and processed data on heart rate, heart rate variability, sleep metrics (bedtime, stages, quality), physical activity (steps, active calories, activity types), and smartphone usage patterns (screen time, notifications, call and message logs). Participants also completed ecological momentary assessments (EMA) and weekly surveys, including instruments like the Beck Depression Inventory (BDI), Patient Health Questionnaire-9 (PHQ-9), Perceived Stress Scale, Sense of Coherence Scale, Social Connectedness Scale, Twente Engagement with E-Health Technologies questionnaire, and the UCLA Loneliness Scale. This dataset can be used to study the interplay between loneliness, mental well-being, and daily behaviors of immigrants in a real-world context. MethodsDesign and set up This study was designed to create a longitudinal dataset capturing physiological, behavioral, and psychological data from first-generation immigrants living in Finland. The dataset aims to support research on the relationship between mental health and daily lifestyle factors, providing a foundation for further detection algorithm development. To achieve this, the study collected multimodal data over a 28-day period from every participant. Objective data were gathered from wearable devices, which recorded sleep patterns, physical activity, and cardiovascular health metrics and raw PPG signals. Passive smartphone data, such as screen usage, notifications, calls, and messages, were also collected to capture digital behavior patterns. Subjective data were collected through EMAs delivered via push notifications and weekly self-report surveys. These assessments measured daily emotional states—loneliness, stress, depression, and social connectedness. By integrating multiple data sources, this dataset allows researchers to explore the complex interactions between mental health and lifestyle behaviors under free-living conditions. Data collection To facilitate continuous data collection and remote monitoring, the Centralive was used. Centralive is a digital health platform designed for continuous data collection, data storage, real-time monitoring, and remote management of participant engagement throughout the study. Data was collected using different applications, and wearable devices all centralized to the Centralive system. Then the collected data was transferred and stored in the Centralive’s cloud server. The Centralive’s dashboard was used to monitor the collected data to monitor participant’s engagement during data collection. To collect the subjective daily EMAs and weekly surveys, the Centralive prompted the daily EMAs at 8 a.m., 2 p.m., 5 p.m., 8 p.m., and 10 p.m. to every participant. The daily EMA contains questions focusing on their current emotions including feelings of loneliness, social connectedness, and affect. The weekly EMA was open from 12 a.m. to 11:59 p.m. and prompted participants every Sunday. Samsung watch active 2, equipped with Tizen open-source Operating System (TizenOS) was used to collect objective physiological signals. The device recorded photoplethysmography (PPG), accelerometer, and gyroscope data at a sampling rate of 20 Hz, while air pressure measurements were captured at 10 Hz. Data collection was scheduled at two-hour intervals, with each recording session lasting 12 minutes. The Oura Ring was used to track participants' sleep and activity patterns throughout the study. Data collected by the Oura Ring, including sleep, activity metrics, and cardiac metrics including heart rate and heart rate variability sensed during sleep. Centralive utilized Open Authentication to securely access and retrieve these data, making them available to researchers on a daily basis for further analysis. The AWARE framework was used to collect passive phone activity data. The AWARE app ran in the background on participants’ smartphones, continuously logging data without requiring active user input. The collected data included battery usage patterns, recording charging events and power consumption to monitor device usage trends. Call logs were also recorded, tracking incoming and outgoing calls with metadata such as timestamps and call duration, but without capturing conversation content. Similarly, message logs documented sent and received text messages, preserving metadata while ensuring privacy. Notifications data provided insights into participants’ digital engagement by logging received notifications, including app source and timestamps. Screen usage patterns were...
https://www.factori.ai/privacy-policyhttps://www.factori.ai/privacy-policy
Mobility data is collected through location-aware mobile apps using an SDK-based implementation. Users explicitly consent to share their location data via a clear opt-in process and are provided with clear opt-out options. Factori ingests, cleans, validates, and exports all location data signals to ensure the highest quality data is available for analysis.
Our data reach encompasses the total counts available across various categories, including attributes such as country location, MAU (Monthly Active Users), DAU (Daily Active Users), and Monthly Location Pings.
We collect data dynamically, offering the most updated data and insights at the best-suited intervals (daily, weekly, monthly, or quarterly).
Our data supports various business needs, including consumer insight, market intelligence, advertising, and retail analytics.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset has been artificially generated to mimic real-world user interactions within a mobile application. It contains 100,000 rows of data, each row of which represents a single event or action performed by a synthetic user. The dataset was designed to capture many of the attributes commonly tracked by app analytics platforms, such as device details, network information, user demographics, session data, and event-level interactions.
User & Session Metadata
User ID: A unique integer identifier for each synthetic user. Session ID: Randomly generated session identifiers (e.g., S-123456), capturing the concept of user sessions. IP Address: Fake IP addresses generated via Faker to simulate different network origins. Timestamp: Randomized timestamps (within the last 30 days) indicating when each interaction occurred. Session Duration: An approximate measure (in seconds) of how long a user remained active. Device & Technical Details
Device OS & OS Version: Simulated operating systems (Android/iOS) with plausible version numbers. Device Model: Common phone models (e.g., “Samsung Galaxy S22,” “iPhone 14 Pro,” etc.). Screen Resolution: Typical screen resolutions found in smartphones (e.g., “1080x1920”). Network Type: Indicates whether the user was on Wi-Fi, 5G, 4G, or 3G. Location & Locale
Location Country & City: Random global locations generated using Faker. App Language: Represents the user’s app language setting (e.g., “en,” “es,” “fr,” etc.). User Properties
Battery Level: The phone’s battery level as a percentage (0–100). Memory Usage (MB): Approximate memory consumption at the time of the event. Subscription Status: Boolean flag indicating if the user is subscribed to a premium service. User Age: Random integer ranging from teenagers to seniors (13–80). Phone Number: Fake phone numbers generated via Faker. Push Enabled: Boolean flag indicating if the user has push notifications turned on. Event-Level Interactions
Event Type: The action taken by the user (e.g., “click,” “view,” “scroll,” “like,” “share,” etc.). Event Target: The UI element or screen component interacted with (e.g., “home_page_banner,” “search_bar,” “notification_popup”). Event Value: A numeric field indicating additional context for the event (e.g., intensity, count, rating). App Version: Simulated version identifier for the mobile application (e.g., “4.2.8”). Data Quality & “Noise” To better approximate real-world data, 1% of all fields have been intentionally “corrupted” or altered:
Typos and Misspellings: Random single-character edits, e.g., “Andro1d” instead of “Android.” Missing Values: Some cells might be blank (None) to reflect dropped or unrecorded data. Random String Injections: Occasional random alphanumeric strings inserted where they don’t belong. These intentional discrepancies can help data scientists practice data cleaning, outlier detection, and data wrangling techniques.
Data Cleaning & Preprocessing: Ideal for practicing how to handle missing values, inconsistent data, and noise in a realistic scenario. Analytics & Visualization: Demonstrate user interaction funnels, session durations, usage by device/OS, etc. Machine Learning & Modeling: Suitable for building classification or clustering models (e.g., user segmentation, event classification). Simulation for Feature Engineering: Experiment with deriving new features (e.g., session frequency, average battery drain, etc.).
Synthetic Data: All entries (users, device info, IPs, phone numbers, etc.) are artificially generated and do not correspond to real individuals. Privacy & Compliance: Since no real personal data is present, there are no direct privacy concerns. However, always handle synthetic data ethically.