53 datasets found

A
‘Android Phones’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Android Phones’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-android-phones-d4c3/70fa3a6f/?iid=000-883&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Android Phones’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/khaiid/android-phones on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Android is the most used operating systems in the mobile phones field, it would be interesting to explore the different manufacturers and devices that uses it and which versions of Android operating system are widely used

Content

The data has about 1300 rows including 4 attributes described as following:

Name: Mobile phone name Brand: Manufacturer brand name Release: Release date of the mobile Version: Android version of the mobile

Questions to be answered

How many phones use Android 11 ? Which phones were released the latest ? Which brand has the most phones released ? How many brands are there

Data Collection

This Data uses material from ( https://en.wikipedia.org/wiki/List_of_Android_smartphones ) which is released under the Creative Commons Attribution-Share-Alike License 3.0

--- Original source retains full ownership of the source dataset ---
Number of smartphone users in the United States 2014-2029
statista.com
ai-chatbox.pro
Updated May 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of smartphone users in the United States 2014-2029 [Dataset]. https://www.statista.com/topics/2711/us-smartphone-market/
Explore at:
Dataset updated
May 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description
The number of smartphone users in the United States was forecast to continuously increase between 2024 and 2029 by in total 17.4 million users (+5.61 percent). After the fifteenth consecutive increasing year, the smartphone user base is estimated to reach 327.54 million users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Mexico and Canada.
o
Data from: Google Play Store Dataset
opendatabay.com
.undefined
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2025). Google Play Store Dataset [Dataset]. https://www.opendatabay.com/data/premium/33624898-8133-421d-9b3b-42f76e1e4fe2
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 15, 2025
Dataset authored and provided by
Bright Data
Area covered
Website Analytics & User Experience
Description
Google Play Store dataset to explore detailed information about apps, including ratings, descriptions, updates, and developer details. Popular use cases include app performance analysis, market research, and consumer behavior insights.

Use our Google Play Store dataset to explore detailed information about apps available on the platform, including app titles, developers, monetization features, user ratings, reviews, and more. This dataset also includes data on app descriptions, safety measures, download counts, recent updates, and compatibility, providing a complete overview of app performance and features.

Tailored for app developers, marketers, and researchers, this dataset offers valuable insights into user preferences, app trends, and market dynamics. Whether you're optimizing app development, conducting competitive analysis, or tracking app performance, the Google Play Store dataset is an essential resource for making data-driven decisions in the mobile app ecosystem.

Dataset Features

url: The URL link to the app’s detail page on the Google Play Store.

title: The name of the application.

developer: The developer or company behind the app.

monetization_features: Information regarding how the app generates revenue (e.g., in-app purchases, ads).

images: Links or references to images associated with the app.

about: Details or a summary description of the app.

data_safety: Information regarding data safety and privacy practices.

rating: The overall rating of the app provided by its users.

number_of_reviews: The total count of user reviews received.

star_reviews: A breakdown of reviews by star ratings.

reviews: Reviews and user feedback about the app.

what_new: Information on the latest updates or features added to the app.

more_by_this_developer: Other apps by the same developer.

content_rating: The content rating which guides suitability based on user age.

downloads: The download count or range indicating the app’s popularity.

country: The country associated with the app listing.

app_category: The category or genre under which the app is classified.

Distribution

Data Volume: 17 Columns and 65.54M Rows

Format: CSV

Usage

This dataset is ideal for a variety of applications:

App Market Analysis: Enables market researchers to extract insights on app popularity, engagement, and trends across different categories.

Machine Learning: Can be used by data scientists to build recommendation engines or sentiment analysis models based on app review data.

User Behavior Studies: Facilitates academic or industrial research into user preferences and behavior with respect to mobile applications.

Coverage

Geographic Coverage: global.

License

CUSTOM Please review the respective licenses below: 1. Data Provider's License - Bright Data Master Service Agreement

Who Can Use It

Data Scientists: To train machine learning models for app popularity prediction, sentiment analysis, or recommendation systems.

Researchers: For academic or scientific studies into market trends, consumer behavior, and app performance analysis.

Businesses: For strategic analysis, developing market insights, or enhancing app development and user engagement strategies.

Suggested Dataset Name

Play store Insights

Android App Scope

Market Analytics

Play Store Metrics Vault

5. AppTrend360: Google Play Edition

Pricing

Based on Delivery frequency

~Up to $0.0025 per record. Min order $250

Approximately 10M new records are added each month. Approximately 13.8M records are updated each month. Get the complete dataset each delivery, including all records. Retrieve only the data you need with the flexibility to set Smart Updates.

Monthly

New snapshot each month, 12 snapshots/year Paid monthly

Quarterly

New snapshot each quarter, 4 snapshots/year Paid quarterly

Bi-annual

New snapshot every 6 months, 2 snapshots/year Paid twice-a-year

One-time purchase

New snapshot one-time delivery Paid once
Global smartphone sales to end users 2007-2023
statista.com
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Global smartphone sales to end users 2007-2023 [Dataset]. https://www.statista.com/statistics/263437/global-smartphone-sales-to-end-users-since-2007/
Explore at:
Dataset updated
Oct 15, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In 2022, smartphone vendors sold around 1.39 billion smartphones were sold worldwide, with this number forecast to drop to 1.34 billion in 2023.

Smartphone penetration rate still on the rise

Less than half of the world’s total population owned a smart device in 2016, but the smartphone penetration rate has continued climbing, reaching 78.05 percent in 2020. By 2025, it is forecast that almost 87 percent of all mobile users in the United States will own a smartphone, an increase from the 27 percent of mobile users in 2010.

Smartphone end user sales

In the United States alone, sales of smartphones were projected to be worth around 73 billion U.S. dollars in 2021, an increase from 18 billion dollars in 2010. Global sales of smartphones are expected to increase from 2020 to 2021 in every major region, as the market starts to recover from the initial impact of the coronavirus (COVID-19) pandemic.
d
Emergent smartphone users' dataset
datadryad.org
zenodo.org
zip
Updated Nov 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shamaila Hayat (2021). Emergent smartphone users' dataset [Dataset]. http://doi.org/10.5061/dryad.gqnk98sp9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.gqnk98sp9
Dataset updated
Nov 18, 2021
Dataset provided by
Dryad
Authors
Shamaila Hayat
Time period covered
2021
Description
Data is cleaned. All inconsistencies and erroneous records have been removed. These two datasets are used to see how the composition of the contact-book of emergent users differ from those of traditional users in aspects like its size, prevalence use of special symbols, the proportion of dialed contacts through the phone-book, and percentage of unintelligible contact names, etc. Aggregated data for 30 emergent users and 30 traditional users is provided in the form of CSV files to replicate the data analysis results. To reproduce the graphs for usability analysis, R scripts are also provided in the same repository. These scripts contain the required data vectors. These graphs show the efficiency, effectiveness, and satisfaction of emergent users on conventional contact-book interfaces.
Dataset of "Extinguishing Ransomware - A Hybrid Approach to Android...
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Ferrante; Alberto Ferrante; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic (2020). Dataset of "Extinguishing Ransomware - A Hybrid Approach to Android Ransomware Detection" [Dataset]. http://doi.org/10.5281/zenodo.1420449
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.1420449
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alberto Ferrante; Alberto Ferrante; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic; Francesco Mercaldo; Miroslaw Malek; Jelena Milosevic
Description
Protection against ransomware is particularly relevant in systems running the Android operating system, due to its huge users' base and, therefore, its potential for monetization from the attackers. In "Extinguishing Ransomware - A Hybrid Approach to Android Ransomware Detection" (see references for details), we describe a hybrid (static + dynamic) malware detection method that has extremely good accuracy (100% detection rate, with false positive below 4%).

We release a dataset related to the dynamic detection part of the aforementioned methods and containing execution traces of ransomware Android applications, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 666 ransomware applications taken from the Heldroid project [https://github.com/necst/heldroid] (the app repository is unavailable at the moment). Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 20,000 stimuli were applied with a maximum execution time of 15 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.

In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:

ransom-per_app-csv.zip - features obtained by executing ransomware applications, one CSV per application

ransom-unified-csv.zip - features obtained by executing ransomware applications, only one CSV file
s
Android Mischief Dataset
stratosphereips.org
zip
Updated May 7, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kamila Babayeva (2021). Android Mischief Dataset [Dataset]. https://www.stratosphereips.org/android-mischief-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 7, 2021
Dataset provided by
Stratosphere Lab, Department of Electrical Engineering, Czech Technical University
Authors
Kamila Babayeva
License
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Time period covered
2020
Area covered
Czech Republic, Prague
Description
The Android Mischief Dataset is a dataset of network traffic from mobile phones infected with Android RATs. Its goal is to offer the community a dataset to learn and analyze the network behaviour of RATs, in order to propose new detections to protect our devices. The current version of the dataset includes 7 packet captures from 7 executed Android RATs. The Android Mischief Dataset was done in the Stratosphere Laboratory, Czech Technical University in Prague.
Google Location History (GLH) mobility dataset
zenodo.org
Updated Jan 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thiago Andrade; Thiago Andrade (2024). Google Location History (GLH) mobility dataset [Dataset]. http://doi.org/10.5281/zenodo.8349569
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8349569
Dataset updated
Jan 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Thiago Andrade; Thiago Andrade
Description
This is a GPS dataset acquired from Google.

Google tracks the user’s device location through Google Maps, which also works on Android devices, the iPhone, and the web.
It’s possible to see the Timeline from the user’s settings in the Google Maps app on Android or directly from the Google Timeline Website.
It has detailed information such as when an individual is walking, driving, and flying.
Such functionality of tracking can be enabled or disabled on demand by the user directly from the smartphone or via the website.
Google has a Take Out service where the users can download all their data or select from the Google products they use the data they want to download.
The dataset contains 120,847 instances from a period of 9 months or 253 unique days from February 2019 to October 2019 from a single user.
The dataset comprises a pair of (latitude, and longitude), and a timestamp.
All the data was delivered in a single CSV file.
As the locations of this dataset are well known by the researchers, this dataset will be used as ground truth in many mobility studies.

Please cite the following papers in order to use the datasets:

T. Andrade, B. Cancela, and J. Gama, "Discovering locations and habits from human mobility data," Annals of Telecommunications, vol. 75, no. 9, pp. 505–521, 2020.
10.1007/s12243-020-00807-x (DOI)
and
T. Andrade, B. Cancela, and J. Gama, "From mobility data to habits and common pathways," Expert Systems, vol. 37, no. 6, p. e12627, 2020.
10.1111/exsy.12627 (DOI)
i
The icsi/netalyzr-android dataset
impactcybertrust.org
Updated Jan 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Data Source (2019). The icsi/netalyzr-android dataset [Dataset]. http://doi.org/10.23721/100/1478847
Explore at:
Unique identifier
https://doi.org/10.23721/100/1478847
Dataset updated
Jan 21, 2019
Authors
External Data Source
Description
This dataset was collected by the ICSI Netalyzr app for Android to develop a characterization of how operational decisions, such as network configurations, business models, and relationships between operators introduce diversity in service quality and affect user security and privacy. We delve in detail beyond the radio link and into network configuration and business relationships in six countries. We identify the widespread use of transparent middleboxes such as HTTP and DNS proxies, analyzing how they actively modify user traffic, compromise user privacy, and potentially undermine user security. In addition, we identify network sharing agreements between operators, highlighting the implications of roaming and characterizing the properties of MVNOs, including that a majority are simply rebranded versions of major operators. More broadly, our findings using this data highlight the importance of considering higher-layer relationships when seeking to analyze mobile traffic in a sound fashion. ; narseo@icsi.berkeley.edu
Z
DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps...
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mercaldo, Francesco (2020). DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1296277
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Ferrante, Alberto
Milosevic, Jelena
Malek, Miroslaw
Mercaldo, Francesco
Description
Protection against malware is particularly relevant on systems running the Android operating system, due to its huge use base and, therefore, its potential for monetization from the attackers.

Protection against malware is particularly relevant in systems running the Android operating system, due to its huge users’ base and, therefore, its potential for monetization from the attackers.

Dynamic malware detection has been widely adopted by the scientific community but not yet in practical applications.

We release DYNAMISM (Dynamic Analysis of Malware), a dataset containing execution traces of both benign and malicious applications running on Android OS, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 2,386 benign applications and 2,495 malicious applications taken from the Malware Genome Project repository [http://www.malgenomeproject.org] and from Drebin Dataset [https://www.sec.cs.tu-bs.de/~danarp/drebin/]. Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 2,000 stimuli were applied with a maximum execution time of 10 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.

In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:

benign-per_app-csv.zip - features obtained by executing benign applications, one CSV per application

benign-unified-csv.zip - features obtained by executing benign applications, only one CSV file

malicious-per_app-csv.zip - features obtained by executing malicious applications, one CSV per application

malicious-unified-csv.zip - features obtained by executing malicious applications, only one CSV file
o
Daily Facebook Mobile App Feedback
opendatabay.com
.undefined
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Daily Facebook Mobile App Feedback [Dataset]. https://www.opendatabay.com/data/ai-ml/feee9cde-b0aa-40aa-946d-ab8cc10cd587
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
Datasimple
Area covered
Reviews & Ratings
Description
This dataset provides daily updated user reviews and ratings for the Facebook Android application, as collected from the Google Play Store. It offers valuable insights into user sentiment, app performance across different versions, and general feedback from the user base. The data can be utilised for analysis of user engagement, identification of common issues, and understanding public perception of the Facebook app.

Columns

reviewId: A unique identifier assigned to each individual review.

userName: The identifier for the user who provided the rating and review.

content: The textual description or comment left by the user as part of their review.

score: The numerical rating given by the user for the app. Ratings typically range from 1 to 5.

thumbsUpCount: The number of 'likes' or upvotes a particular review has received.

reviewCreatedVersion: The specific version of the app that was in use when the review and rating were submitted.

at: The date and time when the rating and review were created.

appVersion: The version of the app.

Distribution

The data files are typically provided in CSV format. A separate sample file will be made available on the platform. The dataset includes a significant number of unique review IDs (over 345,000) and unique user names (nearly 300,000). Ratings show a strong positive bias, with over 223,000 entries receiving scores between 4.92 and 5.00. The majority of reviews have a low number of 'thumbs up' (over 354,000 entries are in the 0-2214 range), though some reviews have gathered a very high number of likes, up to 110,743. The 'appVersion' column has approximately 18% null values, with version 476.0.0.49.74 accounting for 4% and other versions making up the remaining 78%. The time range for the reviews spans from 24th June 2016 to 3rd July 2025, with a notable increase in review volume observed particularly from late 2023 through to mid-2024. The exact total number of records is not specified, but the column distributions indicate a substantial dataset.

Usage

This dataset is ideal for a variety of applications, including: * Performing sentiment analysis on user reviews to gauge public opinion and identify trends. * Monitoring app performance and user satisfaction over time and across different app versions. * Conducting natural language processing (NLP) research on user-generated content. * Informing product development and prioritisation based on direct user feedback. * Benchmarking against competitor apps within the mobile application market.

Coverage

The dataset's geographic scope is global, reflecting reviews from the Google Play Store. The time range of the data extends from 24th June 2016 to 3rd July 2025, with new reviews being updated on a daily basis. Information regarding the demographic scope of the users is not provided within the dataset.

License

CC BY-NC-SA

Who Can Use It

App Developers: To gain immediate insights into user issues, bugs, and feature requests.

Data Scientists & Machine Learning Engineers: For training and validating NLP models, particularly for sentiment analysis or topic modelling on app reviews.

Product Managers: To understand user satisfaction, identify areas for improvement, and track the impact of new features or updates.

Market Researchers: To analyse market trends, competitive positioning, and consumer perceptions within the mobile app sector.

Academics: For research on user behaviour, mobile app ecosystems, and online review dynamics.

Dataset Name Suggestions

Facebook Android App User Reviews

Google Play Store: Facebook App Ratings

Daily Facebook Mobile App Feedback

Facebook App Review Data

User Sentiment: Facebook Android

Attributes

Original Data Source: Facebook Reviews [DAILY UPDATE]
Worrying confessions: A look at data safety labels on Android
zenodo.org
data.niaid.nih.gov
xz
Updated Sep 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Altpeter; Benjamin Altpeter (2022). Worrying confessions: A look at data safety labels on Android [Dataset]. http://doi.org/10.5281/zenodo.7088557
Explore at:
xzAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7088557
Dataset updated
Sep 18, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Altpeter; Benjamin Altpeter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Google Play Store recently introduced a data safety section in order to give users accessible insights into apps’ data collection practices. We analyzed the labels of 43,927 of the most popular apps. Almost one third of the apps with a label claims not to collect any data. But we also saw popular apps, including apps meant for children, admitting to collecting and sharing highly sensitive data like the user’s sexual orientation or health information for tracking and advertising purposes. To verify the declarations, we recorded the network traffic of 500 apps, finding more than one quarter of them transmitting tracking data not declared in their data safety label.

This data set contains a dump of our database, including the top chart data and data safety labels from September 07, 2022, and the recorded network traffic.

The analysis is available at our blog: https://www.datarequests.org/blog/android-data-safety-labels-analysis/
The source code for the analysis is available on GitHub: https://github.com/datenanfragen/android-data-safety-label-analysis
o
ChatGPT Google Play Reviews
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). ChatGPT Google Play Reviews [Dataset]. https://www.opendatabay.com/data/ai-ml/fd040053-e431-4725-b180-af87c1c0a328
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
Area covered
Reviews & Ratings
Description
This dataset provides a daily-updated collection of user reviews and ratings specifically for the ChatGPT Android application. It includes crucial information such as the review text, associated ratings, and the dates when reviews were posted. The dataset also details the relevancy of each review. It serves as a valuable resource for understanding user sentiment, tracking app performance over time, and analysing trends within the AI and Large Language Model (LLM) application landscape.

Columns

reviewId: A unique identifier assigned to each user review.

id: An additional, distinct identifier associated with each review.

userName: The name of the user who submitted the review.

content: The textual body or comment provided by the user in their review.

score: The numerical rating given by the user, typically on a scale from 1 to 5, indicating their satisfaction or experience.

thumbsUpCount: The total number of 'likes' or positive reactions that a particular review has received.

appVersion: The specific version of the ChatGPT Android application that the user was running when they posted their review.

reviewDate: The precise date and time at which the user review was originally posted.

Distribution

The dataset is primarily available in a tabular format, typically a CSV file, facilitating easy integration and analysis. It comprises over 637,000 unique reviews, reflecting a substantial volume of user feedback. This dataset is updated on a daily basis, ensuring access to the latest user opinions and rating trends. While the exact file size is not specified, the number of records indicates a considerable volume of data.

Usage

This dataset is ideal for various analytical applications, including: * Sentiment Analysis: Extracting and understanding user emotions and opinions towards the ChatGPT Android app. * Natural Language Processing (NLP) Tasks: Training and testing NLP models for text classification, entity recognition, and language generation based on real-world user input. * App Performance Monitoring: Tracking changes in user ratings and feedback over time to gauge application performance and identify areas for improvement. * Market Research: Gaining insights into user perception of AI and LLM applications within the mobile market. * Competitive Analysis: Comparing user feedback for the ChatGPT app against other similar applications. * Feature Prioritisation: Identifying desired features or common pain points mentioned by users to inform product development.

Coverage

This dataset offers global coverage, collecting reviews from users across the world. The time range for the reviews spans from 25 July 2023 to 30 June 2025. This extensive period allows for longitudinal studies of user sentiment and app evolution. It captures feedback from a diverse demographic of ChatGPT Android app users. Some data points, such as appVersion, may occasionally have null values.

License

CC-BY-NC-SA

Who Can Use It

Data Scientists and Analysts: For conducting sentiment analysis, topic modelling, and training machine learning models on large volumes of text data.

App Developers and Product Managers: To understand user feedback, identify bugs, track feature requests, and plan future app enhancements.

Market Researchers: To assess market trends, consumer behaviour, and the reception of AI-powered applications.

Academic Researchers: For studies on human-AI interaction, user experience, and public perception of emerging technologies.

Businesses in the AI/LLM Sector: To perform competitive analysis and inform strategic decisions based on real user feedback.

Dataset Name Suggestions

ChatGPT Android App Reviews

Daily ChatGPT Mobile Ratings

AI App User Feedback Data

ChatGPT Google Play Reviews

User Sentiment for ChatGPT App

Attributes

Original Data Source: ChatGPT reviews [DAILY UPDATED]
Mobile phone users Philippines 2021-2029
statista.com
ai-chatbox.pro
Updated Feb 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Mobile phone users Philippines 2021-2029 [Dataset]. https://www.statista.com/forecasts/558756/number-of-mobile-internet-user-in-the-philippines
Explore at:
Dataset updated
Feb 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Philippines
Description
The number of smartphone users in the Philippines was forecast to increase between 2024 and 2029 by in total 5.6 million users (+7.29 percent). This overall increase does not happen continuously, notably not in 2026, 2027, 2028 and 2029. The smartphone user base is estimated to amount to 82.33 million users in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
F
Finnish Product Image OCR Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Finnish Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/finnish-product-image-ocr-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Finnish Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Finnish language.
Dataset Contain & Diversity:
Containing a total of 2000 images, this Finnish OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Finnish text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native Finnish people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Finnish text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native Finnish crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Finnish language. Your journey to enhanced language understanding and processing starts here.
F
Bahasa Product Image OCR Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Bahasa Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/bahasa-product-image-ocr-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Bahasa Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Bahasa language.
Dataset Contain & Diversity:
Containing a total of 2000 images, this Bahasa OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Bahasa text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native Bahasa people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Bahasa text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native Bahasa crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Bahasa language. Your journey to enhanced language understanding and processing starts here.
Lab Study Dataset: Fast and Secure Contact Exchange in Groups
zenodo.org
data.niaid.nih.gov
zip
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florentin Putz; Florentin Putz; Steffen Haesler; Steffen Haesler; Matthias Hollick; Matthias Hollick (2025). Lab Study Dataset: Fast and Secure Contact Exchange in Groups [Dataset]. http://doi.org/10.5281/zenodo.13324112
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13324112
Dataset updated
Feb 5, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Florentin Putz; Florentin Putz; Steffen Haesler; Steffen Haesler; Matthias Hollick; Matthias Hollick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This record contains the lab study dataset and evaluation R source code from the paper "Sounds Good? Fast and Secure Contact Exchange in Groups" by Florentin Putz, Steffen Haesler, and Matthias Hollick in Proceedings of the ACM on Human-Computer Interaction (CSCW '24).

Abstract:

Trustworthy digital communication requires the secure exchange of contact information, but current approaches lack usability and scalability for larger groups of users. We evaluate the usability of two secure contact exchange systems: the current state of the art, SafeSlinger, and our newly designed protocol, PairSonic, which extends trust from physical encounters to spontaneous online communication. Our lab study (N=45) demonstrates PairSonic's superior usability, automating the tedious verification tasks from previous approaches via an acoustic out-of-band channel. Although participants significantly preferred our system, minimizing user effort surprisingly decreased the perceived security for some users, who associated security with complexity. We discuss user perceptions of the different protocol components and identify remaining usability barriers for CSCW application scenarios.

Dataset:

Our pseudonymous dataset contains usability, security, and preference scores, completion times, reported usage of nine types of social and collaborative tools, and seven demographic and control variables, for each of our 45 participants.

Analysis source code:

Our R Markdown source code includes the full reproducible code of our analysis. This code generates all statistical figures from our paper. The code can also be used to reproduce our quantitative results and tables.

Please refer to the README.md file and our paper for further details about the dataset and the lab study.

Acknowledgments:

This work has been funded by the LOEWE initiative (Hesse, Germany) within the emergenCITY center [LOEWE/1/12/519/03/05.001(0016)/72].
COVID-19 Pandemic Wikipedia Readership
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isaac Johnson; Leila Zia; Joseph Allemandou; Marcel Ruiz Forns; Nuria Ruiz; Fabian Kaelin (2023). COVID-19 Pandemic Wikipedia Readership [Dataset]. http://doi.org/10.6084/m9.figshare.14548032.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14548032.v3
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Isaac Johnson; Leila Zia; Joseph Allemandou; Marcel Ruiz Forns; Nuria Ruiz; Fabian Kaelin
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This data release includes two Wikipedia datasets related to the readership of the project as it relates to the early COVID-19 pandemic period. The first dataset is COVID-19 article page views by country, the second dataset is one hop navigation where one of the two pages are COVID-19 related. The data covers roughly the first six months of the pandemic, more specifically from January 1st 2020 to June 30th 2020. For more background on the pandemic in those months, see English Wikipedia's Timeline of the COVID-19 pandemic.Wikipedia articles are considered COVID-19 related according the methodology described here, the list of COVID-19 articles used for the released datasets is available in covid_articles.tsv. For simplicity and transparency, the same list of articles from 20 April 2020 was used for the entire dataset though in practice new COVID-19-relevant articles were constantly being created as the pandemic evolved.Privacy considerationsWhile this data is considered valuable for the insight that it can provide about information-seeking behaviors around the pandemic in its early months across diverse geographies, care must be taken to not inadvertently reveal information about the behavior of individual Wikipedia readers. We put in place a number of filters to release as much data as we can while minimizing the risk to readers.The Wikimedia foundation started to release most viewed articles by country from Jan 2021. At the beginning of the COVID-19 an exemption was made to store reader data about the pandemic with additional privacy protections:- exclude the page views from users engaged in an edit session- exclude reader data from specific countries (with a few exceptions)- the aggregated statistics are based on 50% of reader sessions that involve a pageview to a COVID-19-related article (see covid_pages.tsv). As a control, a 1% random sample of reader sessions that have no pageviews to COVID-19-related articles was kept. In aggregate, we make sure this 1% non-COVID-19 sample and 50% COVID-19 sample represents less than 10% of pageviews for a country for that day. The randomization and filters occurs on a daily cadence with all timestamps in UTC.- exclude power users - i.e. userhashes with greater than 500 pageviews in a day. This doubles as another form of likely bot removal, protects very heavy users of the project, and also in theory would help reduce the chance of a single user heavily skewing the data.- exclude readership from users of the iOS and Android Wikipedia apps. In effect, the view counts in this dataset represent comparable trends rather than the total amount of traffic from a given country. For more background on readership data per country data, and the COVID-19 privacy protections in particular, see this phabricator.To further minimize privacy risks, a k-anonymity threshold of 100 was applied to the aggregated counts. For example, a page needs to be viewed at least 100 times in a given country and week in order to be included in the dataset. In addition, the view counts are floored to a multiple of 100.DatasetsThe datasets published in this release are derived from a reader session dataset generated by the code in this notebook with the filtering described above. The raw reader session data itself will not be publicly available due to privacy considerations. The datasets described below are similar to the pageviews and clickstream data that the Wikimedia foundation publishes already, with the addition of the country specific counts.COVID-19 pageviewsThe file covid_pageviews.tsv contains:- pageview counts for COVID-19 related pages, aggregated by week and country- k-anonymity threshold of 100- example: In the 13th week of 2020 (23 March - 29 March 2020), the page 'Pandémie_de_Covid-19_en_Italie' on French Wikipedia was visited 11700 times from readers in Belgium- as a control bucket, we include pageview counts to all pages aggregated by week and country. Due to privacy considerations during the collection of the data, the control bucket was sampled at ~1% of all view traffic. The view counts for the control title are thus proportional to the total number of pageviews to all pages.The file is ~8 MB and contains ~134000 data points across the 27 weeks, 108 countries, and 168 projects.Covid reader session bigramsThe file covid_session_bigrams.tsv contains:- number of occurrences of visits to pages A -> B, where either A or B is a COVID-19 related article. Note that the bigrams are tuples (from, to) of articles viewed in succession, the underlying mechanism can be clicking on a link in an article, but it may also have been a new search or reading both articles based on links from third source articles. In contrast, the clickstream data is based on referral information only- aggregated by month and country- k-anonymity threshold of 100- example: In March of 2020, there were a 1000 occurences of readers accessing the page es.wikipedia/SARS-CoV-2 followed by es.wikipedia/Orthocoronavirinae from ChileThe file is ~10 MB and contains ~90000 bigrams across the 6 months, 96 countries, and 56 projects.ContactPlease reach out to research-feedback@wikimedia.org for any questions.
ANTI-PHISHING IN ANDROID PHONE PROJECT REPORT
kaggle.com
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kamal Acharya (2025). ANTI-PHISHING IN ANDROID PHONE PROJECT REPORT [Dataset]. http://doi.org/10.34740/kaggle/dsv/12394228
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/12394228
Dataset updated
Jul 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kamal Acharya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Phishing is a new word produced from 'fishing', it refers to the act that the attacker allure users to visit a faked Web site by sending them faked e-mails (or instant messages), and stealthily get victim's personal information such as user name, password, and national security ID, etc. This information then can be used for future target advertisements or even identity theft attacks (e.g., transfer money from victims' bank account). The frequently used attack method is to send e-mails to potential victims, which seemed to be sent by banks, online organizations, or ISPs. In these e-mails, they will make up some causes, e.g. the password of your credit card had been mis-entered for many times, or they are providing upgrading services, to allure you visit their Web site to conform or modify your account number and password through the hyperlink provided in the e-mail (Leon, 2008).
f
Statistical values of the CNN model.
plos.figshare.com
xls
Updated Jan 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Aamir; Muhammad Waseem Iqbal; Mariam Nosheen; M. Usman Ashraf; Ahmad Shaf; Khalid Ali Almarhabi; Ahmed Mohammed Alghamdi; Adel A. Bahaddad (2024). Statistical values of the CNN model. [Dataset]. http://doi.org/10.1371/journal.pone.0296722.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0296722.t001
Dataset updated
Jan 19, 2024
Dataset provided by
PLOS ONE
Authors
Muhammad Aamir; Muhammad Waseem Iqbal; Mariam Nosheen; M. Usman Ashraf; Ahmad Shaf; Khalid Ali Almarhabi; Ahmed Mohammed Alghamdi; Adel A. Bahaddad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Android is the most popular operating system of the latest mobile smart devices. With this operating system, many Android applications have been developed and become an essential part of our daily lives. Unfortunately, different kinds of Android malware have also been generated with these applications’ endless stream and somehow installed during the API calls, permission granted and extra packages installation and badly affected the system security rules to harm the system. Therefore, it is compulsory to detect and classify the android malware to save the user’s privacy to avoid maximum damages. Many research has already been developed on the different techniques related to android malware detection and classification. In this work, we present AMDDLmodel a deep learning technique that consists of a convolutional neural network. This model works based on different parameters, filter sizes, number of epochs, learning rates, and layers to detect and classify the android malware. The Drebin dataset consisting of 215 features was used for this model evaluation. The model shows an accuracy value of 99.92%. The other statistical values are precision, recall, and F1-score. AMDDLmodel introduces innovative deep learning for Android malware detection, enhancing accuracy and practical user security through inventive feature engineering and comprehensive performance evaluation. The AMDDLmodel shows the highest accuracy values as compared to the existing techniques.

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Android Phones’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-android-phones-d4c3/70fa3a6f/?iid=000-883&v=presentation

‘Android Phones’ analyzed by Analyst-2

Explore at:

Dataset updated

Jan 28, 2022

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘Android Phones’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/khaiid/android-phones on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Android is the most used operating systems in the mobile phones field, it would be interesting to explore the different manufacturers and devices that uses it and which versions of Android operating system are widely used

Content

The data has about 1300 rows including 4 attributes described as following:

Name: Mobile phone name Brand: Manufacturer brand name Release: Release date of the mobile Version: Android version of the mobile

Questions to be answered

How many phones use Android 11 ? Which phones were released the latest ? Which brand has the most phones released ? How many brands are there

Data Collection

This Data uses material from ( https://en.wikipedia.org/wiki/List_of_Android_smartphones ) which is released under the Creative Commons Attribution-Share-Alike License 3.0

--- Original source retains full ownership of the source dataset ---

Clear search

Close search

Google apps

Main menu

‘Android Phones’ analyzed by Analyst-2

Context

Content

Questions to be answered

Data Collection

Number of smartphone users in the United States 2014-2029

Data from: Google Play Store Dataset

Dataset Features

Distribution

Usage

Coverage

License

Who Can Use It

Suggested Dataset Name

5. AppTrend360: Google Play Edition

Pricing

Based on Delivery frequency

Global smartphone sales to end users 2007-2023

Emergent smartphone users' dataset

Dataset of "Extinguishing Ransomware - A Hybrid Approach to Android...

Android Mischief Dataset

Google Location History (GLH) mobility dataset

The icsi/netalyzr-android dataset

DYNAMISM - Postprocessed Execution Traces Of Android Malware and Benign Apps...

Daily Facebook Mobile App Feedback

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Worrying confessions: A look at data safety labels on Android

ChatGPT Google Play Reviews

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Mobile phone users Philippines 2021-2029

Finnish Product Image OCR Dataset

What’s Included

Bahasa Product Image OCR Dataset

What’s Included

Lab Study Dataset: Fast and Secure Contact Exchange in Groups

COVID-19 Pandemic Wikipedia Readership

ANTI-PHISHING IN ANDROID PHONE PROJECT REPORT

Statistical values of the CNN model.

‘Android Phones’ analyzed by Analyst-2

Context

Content

Questions to be answered

Data Collection