Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Verified dataset of 2025 device usage: share of global web traffic, mobile commerce share of transactions, US daily time spent, app vs web breakdown, and tablet decline.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Inspirations:
Some potential Kaggle datasets or competitions that might inspire your project include:
Facebook
TwitterData-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.3k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 66k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query-by-example search over UIs.
Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. Like its predecessor ERICA, Rico’s app mining infrastructure requires no access to — or modification of — an app’s source code. Apps are downloaded from the Google Play Store and served to crowd workers through a web interface. When crowd workers use an app, the system records a user interaction trace that captures the UIs visited and the interactions performed on them. Then, an automated agent replays the trace to warm up a new copy of the app and continues the exploration programmatically, leveraging a content-agnostic similarity heuristic to efficiently discover new UI states. By combining crowdsourcing and automation, Rico can achieve higher coverage over an app’s UI states than either crawling strategy alone. In total, 13 workers recruited on UpWork spent 2,450 hours using apps on the platform over five months, producing 10,811 user interaction traces. After collecting a user trace for an app, we ran the automated crawler on the app for one hour.
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN https://interactionmining.org/rico
The Rico dataset is large enough to support deep learning applications. We trained an autoencoder to learn an embedding for UI layouts, and used it to annotate each UI with a 64-dimensional vector representation encoding visual layout. This vector representation can be used to compute structurally — and often semantically — similar UIs, supporting example-based search over the dataset. To create training inputs for the autoencoder that embed layout information, we constructed a new image for each UI capturing the bounding box regions of all leaf elements in its view hierarchy, differentiating between text and non-text elements. Rico’s view hierarchies obviate the need for noisy image processing or OCR techniques to create these inputs.
Facebook
TwitterThe smartphone helps workers balance the demands of their professional and personal lives but can also be a distraction, affecting productivity, wellbeing, and work-life balance. Drawing from insights on the impact of physical environments on object engagement, this study examines how the distance between the smartphone and the user influences interactions in work contexts. Participants (N = 22) engaged in two 5h knowledge work sessions on the computer, with the smartphone placed outside their immediate reach during one session. Results show that limited smartphone accessibility led to reduced smartphone use, but participants shifted non-work activities to the computer and the time they spent on work and leisure activities overall remained unchanged. These findings suggest that discussions on smartphone disruptiveness in work contexts should consider the specific activities performed, challenging narratives of ‘smartphone addiction’ and ‘smartphone overuse’ as the cause of increased disruptions and lowered work productivity.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
It is no secret that mobile devices are increasingly taking over the market at the expense of stationary equipment and many forgotten tablets. Trends change over time and the data collected helps us understand them. So let's look at the share of these three sections in the most populous country in the world, which is India.
The database saved in .csv form contains 4 columns. The first column contains the date (YYYY-MM) from the measurement period. Each subsequent column contains the percentage of market share in mobile, desktop and tablet markets, given as a percentage, rounded to 2 decimal places (if the share is less than 0.5%, the value 0 remains, even though it may constitute a very small percentage of the share). We have a total of 180 rows, i.e. full 15 years of data for each month.
The database comes from the statcounter website and is available under the CC BY-SA 3.0 license, which allows you to copy, use and distribute the data also for commercial purposes after citing the source.
Photo by Andrew Neel on Unsplash
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes network traffic data from more than 50 Android applications across 5 different scenarios. The applications are consistent in all scenarios, but other factors like location, device, and user vary (see Table 2 in the paper). The current repository pertains to Scenario A. Within the repository, for each application, there is a compressed file containing the relevant PCAP files. The PCAP files follow the naming convention: {Application Name}{Scenario ID}{#Trace}_Final.pcap.
Facebook
TwitterUnlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.
Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.
User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.
Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.
GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.
Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.
High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.
Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.
Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
Facebook
TwitterEnglish(North America) Scripted Monologue Smartphone and PC speech dataset, collected from monologue based on given scripts, covering common expressions. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(302 North American), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset captures random yet realistic smartphone usage behavior of 50 users, including their daily screen time, app opens, primary app category, notifications received, and battery usage. It can be used for mobile analytics, user behavior research, productivity improvement studies, and predictive modeling.
Facebook
TwitterAI Training Data | Annotated Checkout Flows for Retail, Restaurant, and Marketplace Websites Overview
Unlock the next generation of agentic commerce and automated shopping experiences with this comprehensive dataset of meticulously annotated checkout flows, sourced directly from leading retail, restaurant, and marketplace websites. Designed for developers, researchers, and AI labs building large language models (LLMs) and agentic systems capable of online purchasing, this dataset captures the real-world complexity of digital transactions—from cart initiation to final payment.
Key Features
Breadth of Coverage: Over 10,000 unique checkout journeys across hundreds of top e-commerce, food delivery, and service platforms, including but not limited to Walmart, Target, Kroger, Whole Foods, Uber Eats, Instacart, Shopify-powered sites, and more.
Actionable Annotation: Every flow is broken down into granular, step-by-step actions, complete with timestamped events, UI context, form field details, validation logic, and response feedback. Each step includes:
Page state (URL, DOM snapshot, and metadata)
User actions (clicks, taps, text input, dropdown selection, checkbox/radio interactions)
System responses (AJAX calls, error/success messages, cart/price updates)
Authentication and account linking steps where applicable
Payment entry (card, wallet, alternative methods)
Order review and confirmation
Multi-Vertical, Real-World Data: Flows sourced from a wide variety of verticals and real consumer environments, not just demo stores or test accounts. Includes complex cases such as multi-item carts, promo codes, loyalty integration, and split payments.
Structured for Machine Learning: Delivered in standard formats (JSONL, CSV, or your preferred schema), with every event mapped to action types, page features, and expected outcomes. Optional HAR files and raw network request logs provide an extra layer of technical fidelity for action modeling and RLHF pipelines.
Rich Context for LLMs and Agents: Every annotation includes both human-readable and model-consumable descriptions:
“What the user did” (natural language)
“What the system did in response”
“What a successful action should look like”
Error/edge case coverage (invalid forms, OOS, address/payment errors)
Privacy-Safe & Compliant: All flows are depersonalized and scrubbed of PII. Sensitive fields (like credit card numbers, user addresses, and login credentials) are replaced with realistic but synthetic data, ensuring compliance with privacy regulations.
Each flow tracks the user journey from cart to payment to confirmation, including:
Adding/removing items
Applying coupons or promo codes
Selecting shipping/delivery options
Account creation, login, or guest checkout
Inputting payment details (card, wallet, Buy Now Pay Later)
Handling validation errors or OOS scenarios
Order review and final placement
Confirmation page capture (including order summary details)
Why This Dataset?
Building LLMs, agentic shopping bots, or e-commerce automation tools demands more than just page screenshots or API logs. You need deeply contextualized, action-oriented data that reflects how real users interact with the complex, ever-changing UIs of digital commerce. Our dataset uniquely captures:
The full intent-action-outcome loop
Dynamic UI changes, modals, validation, and error handling
Nuances of cart modification, bundle pricing, delivery constraints, and multi-vendor checkouts
Mobile vs. desktop variations
Diverse merchant tech stacks (custom, Shopify, Magento, BigCommerce, native apps, etc.)
Use Cases
LLM Fine-Tuning: Teach models to reason through step-by-step transaction flows, infer next-best-actions, and generate robust, context-sensitive prompts for real-world ordering.
Agentic Shopping Bots: Train agents to navigate web/mobile checkouts autonomously, handle edge cases, and complete real purchases on behalf of users.
Action Model & RLHF Training: Provide reinforcement learning pipelines with ground truth “what happens if I do X?” data across hundreds of real merchants.
UI/UX Research & Synthetic User Studies: Identify friction points, bottlenecks, and drop-offs in modern checkout design by replaying flows and testing interventions.
Automated QA & Regression Testing: Use realistic flows as test cases for new features or third-party integrations.
What’s Included
10,000+ annotated checkout flows (retail, restaurant, marketplace)
Step-by-step event logs with metadata, DOM, and network context
Natural language explanations for each step and transition
All flows are depersonalized and privacy-compliant
Example scripts for ingesting, parsing, and analyzing the dataset
Flexible licensing for research or commercial use
Sample Categories Covered
Grocery delivery (Instacart, Walmart, Kroger, Target, etc.)
Restaurant takeout/delivery (Ub...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains information about the contents of 100 Terms of Service (ToS) of online platforms. The documents were analyzed and evaluated from the point of view of the European Union consumer law. The main results have been presented in the table titled "Terms of Service Analysis and Evaluation_RESULTS." This table is accompanied by the instruction followed by the annotators, titled "Variables Definitions," allowing for the interpretation of the assigned values. In addition, we provide the raw data (analyzed ToS, in the folder "Clear ToS") and the annotated documents (in the folder "Annotated ToS," further subdivided).
SAMPLE: The sample contains 100 contracts of digital platforms operating in sixteen market sectors: Cloud storage, Communication, Dating, Finance, Food, Gaming, Health, Music, Shopping, Social, Sports, Transportation, Travel, Video, Work, and Various. The selected companies' main headquarters span four legal surroundings: the US, the EU, Poland specifically, and Other jurisdictions. The chosen platforms are both privately held and publicly listed and offer both fee-based and free services. Although the sample cannot be treated as representative of all online platforms, it nevertheless accounts for the most popular consumer services in the analyzed sectors and contains a diverse and heterogeneous set.
CONTENT: Each ToS has been assigned the following information: 1. Metadata: 1.1. the name of the service; 1.2. the URL; 1.3. the effective date; 1.4. the language of ToS; 1.5. the sector; 1.6. the number of words in ToS; 1.7–1.8. the jurisdiction of the main headquarters; 1.9. if the company is public or private; 1.10. if the service is paid or free. 2. Evaluative Variables: remedy clauses (2.1– 2.5); dispute resolution clauses (2.6–2.10); unilateral alteration clauses (2.11–2.15); rights to police the behavior of users (2.16–2.17); regulatory requirements (2.18–2.20); and various (2.21–2.25). 3. Count Variables: the number of clauses seen as unclear (3.1) and the number of other documents referred to by the ToS (3.2). 4. Pull-out Text Variables: rights and obligations of the parties (4.1) and descriptions of the service (4.2)
ACKNOWLEDGEMENT: The research leading to these results has received funding from the Norwegian Financial Mechanism 2014-2021, project no. 2020/37/K/HS5/02769, titled “Private Law of Data: Concepts, Practices, Principles & Politics.”
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Login Data Set for Risk-Based Authentication
Synthesized login feature data of >33M login attempts and >3.3M users on a large-scale online service in Norway. Original data collected between February 2020 and February 2021.
This data sets aims to foster research and development for Risk-Based Authentication (RBA) systems. The data was synthesized from the real-world login behavior of more than 3.3M users at a large-scale single sign-on (SSO) online service in Norway.
The users used this SSO to access sensitive data provided by the online service, e.g., a cloud storage and billing information. We used this data set to study how the Freeman et al. (2016) RBA model behaves on a large-scale online service in the real world (see Publication). The synthesized data set can reproduce these results made on the original data set (see Study Reproduction). Beyond that, you can use this data set to evaluate and improve RBA algorithms under real-world conditions.
WARNING: The feature values are plausible, but still totally artificial. Therefore, you should NOT use this data set in productive systems, e.g., intrusion detection systems.
Overview
The data set contains the following features related to each login attempt on the SSO:
Feature
Data Type
Description
Range or Example
IP Address
String
IP address belonging to the login attempt
0.0.0.0 - 255.255.255.255
Country
String
Country derived from the IP address
US
Region
String
Region derived from the IP address
New York
City
String
City derived from the IP address
Rochester
ASN
Integer
Autonomous system number derived from the IP address
0 - 600000
User Agent String
String
User agent string submitted by the client
Mozilla/5.0 (Windows NT 10.0; Win64; ...
OS Name and Version
String
Operating system name and version derived from the user agent string
Windows 10
Browser Name and Version
String
Browser name and version derived from the user agent string
Chrome 70.0.3538
Device Type
String
Device type derived from the user agent string
(mobile, desktop, tablet, bot, unknown)1
User ID
Integer
Idenfication number related to the affected user account
[Random pseudonym]
Login Timestamp
Integer
Timestamp related to the login attempt
[64 Bit timestamp]
Round-Trip Time (RTT) [ms]
Integer
Server-side measured latency between client and server
1 - 8600000
Login Successful
Boolean
True: Login was successful, False: Login failed
(true, false)
Is Attack IP
Boolean
IP address was found in known attacker data set
(true, false)
Is Account Takeover
Boolean
Login attempt was identified as account takeover by incident response team of the online service
(true, false)
Data Creation
As the data set targets RBA systems, especially the Freeman et al. (2016) model, the statistical feature probabilities between all users, globally and locally, are identical for the categorical data. All the other data was randomly generated while maintaining logical relations and timely order between the features.
The timestamps, however, are not identical and contain randomness. The feature values related to IP address and user agent string were randomly generated by publicly available data, so they were very likely not present in the real data set. The RTTs resemble real values but were randomly assigned among users per geolocation. Therefore, the RTT entries were probably in other positions in the original data set.
The country was randomly assigned per unique feature value. Based on that, we randomly assigned an ASN related to the country, and generated the IP addresses for this ASN. The cities and regions were derived from the generated IP addresses for privacy reasons and do not reflect the real logical relations from the original data set.
The device types are identical to the real data set. Based on that, we randomly assigned the OS, and based on the OS the browser information. From this information, we randomly generated the user agent string. Therefore, all the logical relations regarding the user agent are identical as in the real data set.
The RTT was randomly drawn from the login success status and synthesized geolocation data. We did this to ensure that the RTTs are realistic ones.
Regarding the Data Values
Due to unresolvable conflicts during the data creation, we had to assign some unrealistic IP addresses and ASNs that are not present in the real world. Nevertheless, these do not have any effects on the risk scores generated by the Freeman et al. (2016) model.
You can recognize them by the following values:
ASNs with values >= 500.000
IP addresses in the range 10.0.0.0 - 10.255.255.255 (10.0.0.0/8 CIDR range)
Study Reproduction
Based on our evaluation, this data set can reproduce our study results regarding the RBA behavior of an RBA model using the IP address (IP address, country, and ASN) and user agent string (Full string, OS name and version, browser name and version, device type) as features.
The calculated RTT significances for countries and regions inside Norway are not identical using this data set, but have similar tendencies. The same is true for the Median RTTs per country. This is due to the fact that the available number of entries per country, region, and city changed with the data creation procedure. However, the RTTs still reflect the real-world distributions of different geolocations by city.
See RESULTS.md for more details.
Ethics
By using the SSO service, the users agreed in the data collection and evaluation for research purposes. For study reproduction and fostering RBA research, we agreed with the data owner to create a synthesized data set that does not allow re-identification of customers.
The synthesized data set does not contain any sensitive data values, as the IP addresses, browser identifiers, login timestamps, and RTTs were randomly generated and assigned.
Publication
You can find more details on our conducted study in the following journal article:
Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service (2022) Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono. ACM Transactions on Privacy and Security
Bibtex
@article{Wiefling_Pump_2022, author = {Wiefling, Stephan and Jørgensen, Paul René and Thunem, Sigurd and Lo Iacono, Luigi}, title = {Pump {Up} {Password} {Security}! {Evaluating} and {Enhancing} {Risk}-{Based} {Authentication} on a {Real}-{World} {Large}-{Scale} {Online} {Service}}, journal = {{ACM} {Transactions} on {Privacy} and {Security}}, doi = {10.1145/3546069}, publisher = {ACM}, year = {2022} }
License
This data set and the contents of this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. See the LICENSE file for details. If the data set is used within a publication, the following journal article has to be cited as the source of the data set:
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono: Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service. In: ACM Transactions on Privacy and Security (2022). doi: 10.1145/3546069
Few (invalid) user agents strings from the original data set could not be parsed, so their device type is empty. Perhaps this parse error is useful information for your studies, so we kept these 1526 entries.↩︎
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Robot-at-Home dataset (Robot@Home, paper here) is a collection of raw and processed data from five domestic settings compiled by a mobile robot equipped with 4 RGB-D cameras and a 2D laser scanner. Its main purpose is to serve as a testbed for semantic mapping algorithms through the categorization of objects and/or rooms.
This dataset is unique in three aspects:
The provided data were captured with a rig of 4 RGB-D sensors with an overall field of view of 180°H. and 58°V., and with a 2D laser scanner.
It comprises diverse and numerous data: sequences of RGB-D images and laser scans from the rooms of five apartments (87,000+ observations were collected), topological information about the connectivity of these rooms, and 3D reconstructions and 2D geometric maps of the visited rooms.
The provided ground truth is dense, including per-point annotations of the categories of the objects and rooms appearing in the reconstructed scenarios, and per-pixel annotations of each RGB-D image within the recorded sequences
During the data collection, a total of 36 rooms were completely inspected, so the dataset is rich in contextual information of objects and rooms. This is a valuable feature, missing in most of the state-of-the-art datasets, which can be exploited by, for instance, semantic mapping systems that leverage relationships like pillows are usually on beds or ovens are not in bathrooms.
Robot@Home2
Robot@Home2, is an enhanced version aimed at improving usability and functionality for developing and testing mobile robotics and computer vision algorithms. It consists of three main components. Firstly, a relational database that states the contextual information and data links, compatible with Standard Query Language. Secondly,a Python package for managing the database, including downloading, querying, and interfacing functions. Finally, learning resources in the form of Jupyter notebooks, runnable locally or on the Google Colab platform, enabling users to explore the dataset without local installations. These freely available tools are expected to enhance the ease of exploiting the Robot@Home dataset and accelerate research in computer vision and robotics.
If you use Robot@Home2, please cite the following paper:
Gregorio Ambrosio-Cestero, Jose-Raul Ruiz-Sarmiento, Javier Gonzalez-Jimenez, The Robot@Home2 dataset: A new release with improved usability tools, in SoftwareX, Volume 23, 2023, 101490, ISSN 2352-7110, https://doi.org/10.1016/j.softx.2023.101490.
@article{ambrosio2023robotathome2,title = {The Robot@Home2 dataset: A new release with improved usability tools},author = {Gregorio Ambrosio-Cestero and Jose-Raul Ruiz-Sarmiento and Javier Gonzalez-Jimenez},journal = {SoftwareX},volume = {23},pages = {101490},year = {2023},issn = {2352-7110},doi = {https://doi.org/10.1016/j.softx.2023.101490},url = {https://www.sciencedirect.com/science/article/pii/S2352711023001863},keywords = {Dataset, Mobile robotics, Relational database, Python, Jupyter, Google Colab}}
Version historyv1.0.1 Fixed minor bugs.v1.0.2 Fixed some inconsistencies in some directory names. Fixes were necessary to automate the generation of the next version.v2.0.0 SQL based dataset. Robot@Home v1.0.2 has been packed into a sqlite database along with RGB-D and scene files which have been assembled into a hierarchical structured directory free of redundancies. Path tables are also provided to reference files in both v1.0.2 and v2.0.0 directory hierarchies. This version has been automatically generated from version 1.0.2 through the toolbox.v2.0.1 A forgotten foreign key pair have been added.v.2.0.2 The views have been consolidated as tables which allows a considerable improvement in access time.v.2.0.3 The previous version does not include the database. In this version the database has been uploaded.v.2.1.0 Depth images have been updated to 16-bit. Additionally, both the RGB images and the depth images are oriented in the original camera format, i.e. landscape.
Facebook
TwitterResearch on wireless networks and mobile computing benefits from access to data from real networks and real mobile users. Data captured from production wireless networks help us understand how real users, applications, and devices use real networks under real conditions. CRAWDAD archives data sets relevant to a variety of measurement purposes, including:
Computer malware
Human behavior modeling
Localization
Nework diagnosis
Network performance analysis
Network security
Network simulation
Routing protocols
Routing protocols for Disruption Tolerant Neworks (DTNS)
Social Network Analysis ; crawdad@crawdad.org
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rapid technological innovations over the past few years have led to dramatic changes in today's mobile phone technology. While such changes can improve the quality of life of its users, problematic mobile phone use can result in its users experiencing a range of negative outcomes such as anxiety or, in some cases, engagement in unsafe behaviors with serious health and safety implications such as mobile phone distracted driving. The aims of the present study are two-fold. First, this study investigated the current problem mobile phone use in Australia and its potential implications for road safety. Second, based on the changing nature and pervasiveness of mobile phones in Australian society, this study compared data from 2005 with data collected in 2018 to identify trends in problem mobile phone use in Australia. As predicted, the results demonstrated that problem mobile phone use in Australia increased from the first data collected in 2005. In addition, meaningful differences were found between gender and age groups in this study, with females and users in the 18–25 year-old age group showing higher mean Mobile Phone Problem Use Scale (MPPUS) scores. Additionally, problematic mobile phone use was linked with mobile phone use while driving. Specifically, participants who reported high levels of problem mobile phone use, also reported handheld and hands-free mobile phone use while driving.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset for this project is represented by photos, photos for the buildings of the University of Salford, these photos are taken by a mobile phone camera from different angels and different distances , even though this task sounds so easy but it encountered some challenges, these challenges are summarized below:
1. Obstacles.
a. Fixed or unremovable objects.
When taking several photos for a building or a landscape from different angels and directions ,there are some of these angels blocked by a form of a fixed object such as trees and plants, light poles, signs, statues, cabins, bicycle shades, scooter stands, generators/transformers, construction barriers, construction equipment and any other service equipment so it is unavoidable to represent some photos without these objects included, this will raise 3 questions.
- will these objects confuse the model/application we intend to create meaning will that obstacle prevent the model/application from identifying the designated building?
- Or will the photos be more precise with these objects and provide the capability for the model/application to identify these building with these obstacles included?
- How far is the maximum length for detection? In other words, how far will the mobile device with the application be from the building so it could or could not detect the designated building?
b. Removable and moving objects.
- Any University is crowded with staff and students especially in the rush hours of the day so it is hard for some photos to be taken without a personnel appearing in that photo in a certain time period of the day.
But, due to privacy issues and showing respect to that person, these photos are better excluded.
- Parked vehicles, trollies and service equipment can be an obstacle and might appear in these images as well as it can block access to some areas which an image from a certain angel cannot be obtained.
- Animals, like dogs, cats, birds or even squirrels cannot be avoided in some photos which are entitled to the same questions above.
2. Weather.
In a deep learning project, more data means more accuracy and less error, at this stage of our project it was agreed to have 50 photos per building but we can increase the number of photos for more accurate results but due to the limitation of time for this project it was agreed for 50 per building only.
these photos were taken on cloudy days and to expand our work on this project (as future works and recommendations).
Photos on sunny, rainy, foggy, snowy and any other weather condition days can be included.
Even photos in different times of the day can be included such as night, dawn, and sunset times. To provide our designated model with all the possibilities to identify these buildings in all available circumstances.
University House: 60 images Peel building is an important figure of the University of Salford due to its distinct and amazing exterior design but unfortunately it was excluded from the selection due to some maintenance activities at the time of collecting the photos for this project as it is partially covered with scaffolding and a lot of movement by personnel and equipment. If the supervisor suggests that this will be another challenge to include in the project then, it is mandatory to collect its photos. There are many other buildings in the University of Salford and again to expand our project in the future, we can include all the buildings of the University of Salford. The full list of buildings of the university can be reviewed by accessing an interactive map on: www.salford.ac.uk/find-us
Expand Further. This project can be improved furthermore with so many capabilities, again due to the limitation of time given to this project , these improvements can be implemented later as future works. In simple words, this project is to create an application that can display the building’s name when pointing a mobile device with a camera to that building. Future featured to be added: a. Address/ location: this will require collection of additional data which is the longitude and latitude of each building included or the post code which will be the same taking under consideration how close these buildings appear on the interactive map application such as Google maps, Google earth or iMaps. b. Description of the building: what is the building for, by which school is this building occupied? and what facilities are included in this building? c. Interior Images: all the photos at this stage were taken for the exterior of the buildings, will interior photos make an impact on the model/application for example, if the user is inside newton or chapman and opens the application, will the building be identified especially the interior of these buildings have a high level of similarity for the corridors, rooms, halls, and labs? Will the furniture and assets will be as obstacles or identification marks? d. Directions to a specific area/floor inside the building: if the interior images succeed with the model/application, it would be a good idea adding a search option to the model/application so it can guide the user to a specific area showing directions to that area, for example if the user is inside newton building and searches for lab 141 it will direct him to the first floor of the building with an interactive arrow that changes while the user is approaching his destination. Or, if the application can identify the building from its interior, a drop down list will be activated with each floor of this building, for example, if the model/application identifies Newton building, the drop down list will be activated and when pressing on that drop down list it will represent interactive tabs for each floor of the building, selecting one of the floors by clicking on its tab will display the facilities on that floor for example if the user presses on floor 1 tab, another screen will appear displaying which facilities are on that floor. Furthermore, if the model/application identifies another building, it should activate a different number of floors as buildings differ in the number of floors from each other. this feature can be improved with a voice assistant that can direct the user after he applies his search (something similar to the voice assistant in Google maps but applied to the interior of the university’s buildings. e. Top View: if a drone with a camera can be afforded, it can provide arial images and top views for the buildings that can be added to the model/application but these images can be similar to the interior images situation , the buildings can be similar to each other from the top with other obstacles included like water tanks and AC units.
Other Questions:
Will the model/application be reproducible? the presumed answer for this question should be YES, IF, the model/application will be fed with the proper data (images) such as images of restaurants, schools, supermarkets, hospitals, government facilities...etc.
Facebook
TwitterABSTRACT: With the popularization of low-cost mobile and wearable sensors, prior studies have utilized such sensors to track and analyze people's mental well-being, productivity, and behavioral patterns. However, there still is a lack of open datasets collected in-the-wild contexts with affective and cognitive state labels such as emotion, stress, and attention, which would limit the advances of research in affective computing and human-computer interaction. This work presents K-EmoPhone, an in-the-wild multi-modal dataset collected from 77 university students for seven days. This dataset contains (i) continuous probing of peripheral physiological signals and mobility data measured by commercial off-the-shelf devices; (ii) context and interaction data collected from individuals' smartphones; and (iii) 5,582 self-reported affect states, such as emotion, stress, attention, and disturbance, acquired by the experience sampling method. We anticipate that the presented dataset will contribute to the advancement of affective computing, emotion intelligence technologies, and attention management based on mobile and wearable sensor data.
|
Last update: Apr. 12, 2023 ----------------------------- * Version 1.1.2 (Jun. 3, 2023)
* Version 1.1.1 (Apr. 12, 2023)
* Version 1.1.0 (Feb. 5, 2023)
* Version 1.0.0 (Aug. 3, 2022)
|
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The app analytics market, valued at $7.29 billion in 2025, is experiencing robust growth, projected to expand at a compound annual growth rate (CAGR) of 21.09% from 2025 to 2033. This surge is driven by several key factors. The increasing adoption of mobile applications across diverse industries, coupled with the rising need for businesses to understand user behavior and optimize app performance, fuels the demand for sophisticated analytics solutions. Furthermore, advancements in data analytics technologies, including artificial intelligence (AI) and machine learning (ML), are enabling more insightful and actionable data analysis, further propelling market expansion. The diverse application of app analytics across marketing/advertising, revenue generation, and in-app performance monitoring across various sectors like BFSI, e-commerce, media, travel and tourism, and IT and telecom significantly contributes to this growth. The market is segmented by deployment (mobile apps and website/desktop apps) and end-user industry, with mobile app analytics currently dominating due to the widespread adoption of smartphones. The competitive landscape is characterized by a mix of established technology giants like Google and Amazon alongside specialized app analytics providers like AppsFlyer and Mixpanel. These companies are continuously innovating, integrating new technologies, and expanding their product offerings to cater to the evolving needs of businesses. While the North American market currently holds a significant share, the Asia-Pacific region is expected to witness substantial growth in the coming years driven by increasing smartphone penetration and digitalization initiatives. However, factors like data privacy concerns and the rising complexity of integrating various analytics tools could pose challenges to market growth. Nonetheless, the overall outlook for the app analytics market remains positive, indicating substantial opportunities for players across the value chain. Recent developments include: June 2024 - Comscore and Kochava unveiled an innovative performance media measurement solution, providing marketers with enhanced insights. This cutting-edge cross-screen solution empowers marketers to understand better how linear TV ad campaigns impact both online and offline actions. By integrating Comscore’s Exact Commercial Ratings (ECR) data with Kochava’s sophisticated marketing mix modeling, the solution facilitates the measurement of crucial metrics, including mobile app activities (such as installs and in-app purchases) and website interactions., June 2024 - AppsFlyer announced its integration of the Data Collaboration Platform with Start.io, an omnichannel advertising platform that focuses on real-time mobile audiences for publishers. Through this collaboration, businesses leveraging the AppsFlyer Data Collaboration Platform can merge their Start.io data with campaign metrics and audience insights, creating a more comprehensive dataset for precise audience targeting.. Key drivers for this market are: Increasing Usage of Mobile/Web Apps Across Various End-user Industries, Increasing Adoption of Technologies like 5G Technology and Deeper Penetration of Smartphones; Increase in the Amount of Time Spent on Mobile Devices Coupled With the Increasing Focus on Enhancing Customer Experience. Potential restraints include: Increasing Usage of Mobile/Web Apps Across Various End-user Industries, Increasing Adoption of Technologies like 5G Technology and Deeper Penetration of Smartphones; Increase in the Amount of Time Spent on Mobile Devices Coupled With the Increasing Focus on Enhancing Customer Experience. Notable trends are: Media and Entertainment Industry Expected to Capture Significant Share.
Facebook
TwitterDatos brings to market anonymized, at scale, consolidated privacy-secured datasets with a granularity rarely found in the market. Get access to the desktop and mobile browsing behavior for millions of users across the globe, packaged into clean, easy-to-understand data products and reports.
The Datos Activity Feed is an event-level accounting of all observed URL visits executed by devices which Datos has access to over a given period of time.
This feed can be delivered on a daily basis, delivering the previous day’s data. It can be filtered by any of the fields, so you can focus on what’s important for you, whether it be specific markets or domains.
Now available with Datos Low-Latency Feed This add-on ensures delivery of approximately 99% of all devices before markets open in New York (the lowest latency product on the market). Our clickstream data is made up of an array of upstream sources. The DLLF makes the daily output of these sources available as they arrive and are processed, rather than a once-daily batch.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016. Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India. Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.). In the following, we describe how the search results have been collected. Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page. To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links. A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website. The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees). Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies. The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results. Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas. The search results have been analyzed in order to check if there were evidence of price steering, based on users' location. One term of usage applies: In any research product whose findings are based on this dataset, please cite @inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4_3}, doi = {10.1007/978-3-030-11226-4_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Verified dataset of 2025 device usage: share of global web traffic, mobile commerce share of transactions, US daily time spent, app vs web breakdown, and tablet decline.