Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Android Phones’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/khaiid/android-phones on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Android is the most used operating systems in the mobile phones field, it would be interesting to explore the different manufacturers and devices that uses it and which versions of Android operating system are widely used
The data has about 1300 rows including 4 attributes described as following:
Name: Mobile phone name Brand: Manufacturer brand name Release: Release date of the mobile Version: Android version of the mobile
How many phones use Android 11 ? Which phones were released the latest ? Which brand has the most phones released ? How many brands are there
This Data uses material from ( https://en.wikipedia.org/wiki/List_of_Android_smartphones ) which is released under the Creative Commons Attribution-Share-Alike License 3.0
--- Original source retains full ownership of the source dataset ---
The number of smartphone users in the United States was forecast to continuously increase between 2024 and 2029 by in total 17.4 million users (+5.61 percent). After the fifteenth consecutive increasing year, the smartphone user base is estimated to reach 327.54 million users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Mexico and Canada.
Google Play Store dataset to explore detailed information about apps, including ratings, descriptions, updates, and developer details. Popular use cases include app performance analysis, market research, and consumer behavior insights.
Use our Google Play Store dataset to explore detailed information about apps available on the platform, including app titles, developers, monetization features, user ratings, reviews, and more. This dataset also includes data on app descriptions, safety measures, download counts, recent updates, and compatibility, providing a complete overview of app performance and features.
Tailored for app developers, marketers, and researchers, this dataset offers valuable insights into user preferences, app trends, and market dynamics. Whether you're optimizing app development, conducting competitive analysis, or tracking app performance, the Google Play Store dataset is an essential resource for making data-driven decisions in the mobile app ecosystem.
This dataset is ideal for a variety of applications:
CUSTOM Please review the respective licenses below: 1. Data Provider's License - Bright Data Master Service Agreement
~Up to $0.0025 per record. Min order $250
Approximately 10M new records are added each month. Approximately 13.8M records are updated each month. Get the complete dataset each delivery, including all records. Retrieve only the data you need with the flexibility to set Smart Updates.
New snapshot each month, 12 snapshots/year Paid monthly
New snapshot each quarter, 4 snapshots/year Paid quarterly
New snapshot every 6 months, 2 snapshots/year Paid twice-a-year
New snapshot one-time delivery Paid once
In 2022, smartphone vendors sold around 1.39 billion smartphones were sold worldwide, with this number forecast to drop to 1.34 billion in 2023.
Smartphone penetration rate still on the rise
Less than half of the world’s total population owned a smart device in 2016, but the smartphone penetration rate has continued climbing, reaching 78.05 percent in 2020. By 2025, it is forecast that almost 87 percent of all mobile users in the United States will own a smartphone, an increase from the 27 percent of mobile users in 2010.
Smartphone end user sales
In the United States alone, sales of smartphones were projected to be worth around 73 billion U.S. dollars in 2021, an increase from 18 billion dollars in 2010. Global sales of smartphones are expected to increase from 2020 to 2021 in every major region, as the market starts to recover from the initial impact of the coronavirus (COVID-19) pandemic.
Data is cleaned. All inconsistencies and erroneous records have been removed. These two datasets are used to see how the composition of the contact-book of emergent users differ from those of traditional users in aspects like its size, prevalence use of special symbols, the proportion of dialed contacts through the phone-book, and percentage of unintelligible contact names, etc. Aggregated data for 30 emergent users and 30 traditional users is provided in the form of CSV files to replicate the data analysis results. To reproduce the graphs for usability analysis, R scripts are also provided in the same repository. These scripts contain the required data vectors. These graphs show the efficiency, effectiveness, and satisfaction of emergent users on conventional contact-book interfaces.
Protection against ransomware is particularly relevant in systems running the Android operating system, due to its huge users' base and, therefore, its potential for monetization from the attackers. In "Extinguishing Ransomware - A Hybrid Approach to Android Ransomware Detection" (see references for details), we describe a hybrid (static + dynamic) malware detection method that has extremely good accuracy (100% detection rate, with false positive below 4%).
We release a dataset related to the dynamic detection part of the aforementioned methods and containing execution traces of ransomware Android applications, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 666 ransomware applications taken from the Heldroid project [https://github.com/necst/heldroid] (the app repository is unavailable at the moment). Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 20,000 stimuli were applied with a maximum execution time of 15 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.
In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:
ransom-per_app-csv.zip - features obtained by executing ransomware applications, one CSV per application
ransom-unified-csv.zip - features obtained by executing ransomware applications, only one CSV file
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
The Android Mischief Dataset is a dataset of network traffic from mobile phones infected with Android RATs. Its goal is to offer the community a dataset to learn and analyze the network behaviour of RATs, in order to propose new detections to protect our devices. The current version of the dataset includes 7 packet captures from 7 executed Android RATs. The Android Mischief Dataset was done in the Stratosphere Laboratory, Czech Technical University in Prague.
This is a GPS dataset acquired from Google.
Google tracks the user’s device location through Google Maps, which also works on Android devices, the iPhone, and the web.
It’s possible to see the Timeline from the user’s settings in the Google Maps app on Android or directly from the Google Timeline Website.
It has detailed information such as when an individual is walking, driving, and flying.
Such functionality of tracking can be enabled or disabled on demand by the user directly from the smartphone or via the website.
Google has a Take Out service where the users can download all their data or select from the Google products they use the data they want to download.
The dataset contains 120,847 instances from a period of 9 months or 253 unique days from February 2019 to October 2019 from a single user.
The dataset comprises a pair of (latitude, and longitude), and a timestamp.
All the data was delivered in a single CSV file.
As the locations of this dataset are well known by the researchers, this dataset will be used as ground truth in many mobility studies.
Please cite the following papers in order to use the datasets:
T. Andrade, B. Cancela, and J. Gama, "Discovering locations and habits from human mobility data," Annals of Telecommunications, vol. 75, no. 9, pp. 505–521, 2020.
10.1007/s12243-020-00807-x (DOI)
and
T. Andrade, B. Cancela, and J. Gama, "From mobility data to habits and common pathways," Expert Systems, vol. 37, no. 6, p. e12627, 2020.
10.1111/exsy.12627 (DOI)
This dataset was collected by the ICSI Netalyzr app for Android to develop a characterization of how operational decisions, such as network configurations, business models, and relationships between operators introduce diversity in service quality and affect user security and privacy. We delve in detail beyond the radio link and into network configuration and business relationships in six countries. We identify the widespread use of transparent middleboxes such as HTTP and DNS proxies, analyzing how they actively modify user traffic, compromise user privacy, and potentially undermine user security. In addition, we identify network sharing agreements between operators, highlighting the implications of roaming and characterizing the properties of MVNOs, including that a majority are simply rebranded versions of major operators. More broadly, our findings using this data highlight the importance of considering higher-layer relationships when seeking to analyze mobile traffic in a sound fashion. ; narseo@icsi.berkeley.edu
Protection against malware is particularly relevant on systems running the Android operating system, due to its huge use base and, therefore, its potential for monetization from the attackers.
Protection against malware is particularly relevant in systems running the Android operating system, due to its huge users’ base and, therefore, its potential for monetization from the attackers.
Dynamic malware detection has been widely adopted by the scientific community but not yet in practical applications.
We release DYNAMISM (Dynamic Analysis of Malware), a dataset containing execution traces of both benign and malicious applications running on Android OS, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 2,386 benign applications and 2,495 malicious applications taken from the Malware Genome Project repository [http://www.malgenomeproject.org] and from Drebin Dataset [https://www.sec.cs.tu-bs.de/~danarp/drebin/]. Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 2,000 stimuli were applied with a maximum execution time of 10 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.
In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:
benign-per_app-csv.zip - features obtained by executing benign applications, one CSV per application
benign-unified-csv.zip - features obtained by executing benign applications, only one CSV file
malicious-per_app-csv.zip - features obtained by executing malicious applications, one CSV per application
malicious-unified-csv.zip - features obtained by executing malicious applications, only one CSV file
This dataset provides daily updated user reviews and ratings for the Facebook Android application, as collected from the Google Play Store. It offers valuable insights into user sentiment, app performance across different versions, and general feedback from the user base. The data can be utilised for analysis of user engagement, identification of common issues, and understanding public perception of the Facebook app.
The data files are typically provided in CSV format. A separate sample file will be made available on the platform. The dataset includes a significant number of unique review IDs (over 345,000) and unique user names (nearly 300,000). Ratings show a strong positive bias, with over 223,000 entries receiving scores between 4.92 and 5.00. The majority of reviews have a low number of 'thumbs up' (over 354,000 entries are in the 0-2214 range), though some reviews have gathered a very high number of likes, up to 110,743. The 'appVersion' column has approximately 18% null values, with version 476.0.0.49.74 accounting for 4% and other versions making up the remaining 78%. The time range for the reviews spans from 24th June 2016 to 3rd July 2025, with a notable increase in review volume observed particularly from late 2023 through to mid-2024. The exact total number of records is not specified, but the column distributions indicate a substantial dataset.
This dataset is ideal for a variety of applications, including: * Performing sentiment analysis on user reviews to gauge public opinion and identify trends. * Monitoring app performance and user satisfaction over time and across different app versions. * Conducting natural language processing (NLP) research on user-generated content. * Informing product development and prioritisation based on direct user feedback. * Benchmarking against competitor apps within the mobile application market.
The dataset's geographic scope is global, reflecting reviews from the Google Play Store. The time range of the data extends from 24th June 2016 to 3rd July 2025, with new reviews being updated on a daily basis. Information regarding the demographic scope of the users is not provided within the dataset.
CC BY-NC-SA
Original Data Source: Facebook Reviews [DAILY UPDATE]
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Google Play Store recently introduced a data safety section in order to give users accessible insights into apps’ data collection practices. We analyzed the labels of 43,927 of the most popular apps. Almost one third of the apps with a label claims not to collect any data. But we also saw popular apps, including apps meant for children, admitting to collecting and sharing highly sensitive data like the user’s sexual orientation or health information for tracking and advertising purposes. To verify the declarations, we recorded the network traffic of 500 apps, finding more than one quarter of them transmitting tracking data not declared in their data safety label.
This data set contains a dump of our database, including the top chart data and data safety labels from September 07, 2022, and the recorded network traffic.
The analysis is available at our blog: https://www.datarequests.org/blog/android-data-safety-labels-analysis/
The source code for the analysis is available on GitHub: https://github.com/datenanfragen/android-data-safety-label-analysis
This dataset provides a daily-updated collection of user reviews and ratings specifically for the ChatGPT Android application. It includes crucial information such as the review text, associated ratings, and the dates when reviews were posted. The dataset also details the relevancy of each review. It serves as a valuable resource for understanding user sentiment, tracking app performance over time, and analysing trends within the AI and Large Language Model (LLM) application landscape.
The dataset is primarily available in a tabular format, typically a CSV file, facilitating easy integration and analysis. It comprises over 637,000 unique reviews, reflecting a substantial volume of user feedback. This dataset is updated on a daily basis, ensuring access to the latest user opinions and rating trends. While the exact file size is not specified, the number of records indicates a considerable volume of data.
This dataset is ideal for various analytical applications, including: * Sentiment Analysis: Extracting and understanding user emotions and opinions towards the ChatGPT Android app. * Natural Language Processing (NLP) Tasks: Training and testing NLP models for text classification, entity recognition, and language generation based on real-world user input. * App Performance Monitoring: Tracking changes in user ratings and feedback over time to gauge application performance and identify areas for improvement. * Market Research: Gaining insights into user perception of AI and LLM applications within the mobile market. * Competitive Analysis: Comparing user feedback for the ChatGPT app against other similar applications. * Feature Prioritisation: Identifying desired features or common pain points mentioned by users to inform product development.
This dataset offers global coverage, collecting reviews from users across the world. The time range for the reviews spans from 25 July 2023 to 30 June 2025. This extensive period allows for longitudinal studies of user sentiment and app evolution. It captures feedback from a diverse demographic of ChatGPT Android app users. Some data points, such as appVersion
, may occasionally have null values.
CC-BY-NC-SA
Original Data Source: ChatGPT reviews [DAILY UPDATED]
The number of smartphone users in the Philippines was forecast to increase between 2024 and 2029 by in total 5.6 million users (+7.29 percent). This overall increase does not happen continuously, notably not in 2026, 2027, 2028 and 2029. The smartphone user base is estimated to amount to 82.33 million users in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the Finnish Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Finnish language.
Dataset Contain & Diversity:Containing a total of 2000 images, this Finnish OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Finnish text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native Finnish people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Finnish text recognition models.
Update & Custom Collection:We're committed to expanding this dataset by continuously adding more images with the assistance of our native Finnish crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Finnish language. Your journey to enhanced language understanding and processing starts here.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the Bahasa Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Bahasa language.
Dataset Contain & Diversity:Containing a total of 2000 images, this Bahasa OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Bahasa text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native Bahasa people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Bahasa text recognition models.
Update & Custom Collection:We're committed to expanding this dataset by continuously adding more images with the assistance of our native Bahasa crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Bahasa language. Your journey to enhanced language understanding and processing starts here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This record contains the lab study dataset and evaluation R source code from the paper "Sounds Good? Fast and Secure Contact Exchange in Groups" by Florentin Putz, Steffen Haesler, and Matthias Hollick in Proceedings of the ACM on Human-Computer Interaction (CSCW '24).
Abstract:
Trustworthy digital communication requires the secure exchange of contact information, but current approaches lack usability and scalability for larger groups of users. We evaluate the usability of two secure contact exchange systems: the current state of the art, SafeSlinger, and our newly designed protocol, PairSonic, which extends trust from physical encounters to spontaneous online communication. Our lab study (N=45) demonstrates PairSonic's superior usability, automating the tedious verification tasks from previous approaches via an acoustic out-of-band channel. Although participants significantly preferred our system, minimizing user effort surprisingly decreased the perceived security for some users, who associated security with complexity. We discuss user perceptions of the different protocol components and identify remaining usability barriers for CSCW application scenarios.
Dataset:
Our pseudonymous dataset contains usability, security, and preference scores, completion times, reported usage of nine types of social and collaborative tools, and seven demographic and control variables, for each of our 45 participants.
Analysis source code:
Our R Markdown source code includes the full reproducible code of our analysis. This code generates all statistical figures from our paper. The code can also be used to reproduce our quantitative results and tables.
Please refer to the README.md file and our paper for further details about the dataset and the lab study.
Acknowledgments:
This work has been funded by the LOEWE initiative (Hesse, Germany) within the emergenCITY center [LOEWE/1/12/519/03/05.001(0016)/72].
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data release includes two Wikipedia datasets related to the readership of the project as it relates to the early COVID-19 pandemic period. The first dataset is COVID-19 article page views by country, the second dataset is one hop navigation where one of the two pages are COVID-19 related. The data covers roughly the first six months of the pandemic, more specifically from January 1st 2020 to June 30th 2020. For more background on the pandemic in those months, see English Wikipedia's Timeline of the COVID-19 pandemic.Wikipedia articles are considered COVID-19 related according the methodology described here, the list of COVID-19 articles used for the released datasets is available in covid_articles.tsv. For simplicity and transparency, the same list of articles from 20 April 2020 was used for the entire dataset though in practice new COVID-19-relevant articles were constantly being created as the pandemic evolved.Privacy considerationsWhile this data is considered valuable for the insight that it can provide about information-seeking behaviors around the pandemic in its early months across diverse geographies, care must be taken to not inadvertently reveal information about the behavior of individual Wikipedia readers. We put in place a number of filters to release as much data as we can while minimizing the risk to readers.The Wikimedia foundation started to release most viewed articles by country from Jan 2021. At the beginning of the COVID-19 an exemption was made to store reader data about the pandemic with additional privacy protections:- exclude the page views from users engaged in an edit session- exclude reader data from specific countries (with a few exceptions)- the aggregated statistics are based on 50% of reader sessions that involve a pageview to a COVID-19-related article (see covid_pages.tsv). As a control, a 1% random sample of reader sessions that have no pageviews to COVID-19-related articles was kept. In aggregate, we make sure this 1% non-COVID-19 sample and 50% COVID-19 sample represents less than 10% of pageviews for a country for that day. The randomization and filters occurs on a daily cadence with all timestamps in UTC.- exclude power users - i.e. userhashes with greater than 500 pageviews in a day. This doubles as another form of likely bot removal, protects very heavy users of the project, and also in theory would help reduce the chance of a single user heavily skewing the data.- exclude readership from users of the iOS and Android Wikipedia apps. In effect, the view counts in this dataset represent comparable trends rather than the total amount of traffic from a given country. For more background on readership data per country data, and the COVID-19 privacy protections in particular, see this phabricator.To further minimize privacy risks, a k-anonymity threshold of 100 was applied to the aggregated counts. For example, a page needs to be viewed at least 100 times in a given country and week in order to be included in the dataset. In addition, the view counts are floored to a multiple of 100.DatasetsThe datasets published in this release are derived from a reader session dataset generated by the code in this notebook with the filtering described above. The raw reader session data itself will not be publicly available due to privacy considerations. The datasets described below are similar to the pageviews and clickstream data that the Wikimedia foundation publishes already, with the addition of the country specific counts.COVID-19 pageviewsThe file covid_pageviews.tsv contains:- pageview counts for COVID-19 related pages, aggregated by week and country- k-anonymity threshold of 100- example: In the 13th week of 2020 (23 March - 29 March 2020), the page 'Pandémie_de_Covid-19_en_Italie' on French Wikipedia was visited 11700 times from readers in Belgium- as a control bucket, we include pageview counts to all pages aggregated by week and country. Due to privacy considerations during the collection of the data, the control bucket was sampled at ~1% of all view traffic. The view counts for the control
title are thus proportional to the total number of pageviews to all pages.The file is ~8 MB and contains ~134000 data points across the 27 weeks, 108 countries, and 168 projects.Covid reader session bigramsThe file covid_session_bigrams.tsv contains:- number of occurrences of visits to pages A -> B, where either A or B is a COVID-19 related article. Note that the bigrams are tuples (from, to) of articles viewed in succession, the underlying mechanism can be clicking on a link in an article, but it may also have been a new search or reading both articles based on links from third source articles. In contrast, the clickstream data is based on referral information only- aggregated by month and country- k-anonymity threshold of 100- example: In March of 2020, there were a 1000 occurences of readers accessing the page es.wikipedia/SARS-CoV-2 followed by es.wikipedia/Orthocoronavirinae from ChileThe file is ~10 MB and contains ~90000 bigrams across the 6 months, 96 countries, and 56 projects.ContactPlease reach out to research-feedback@wikimedia.org for any questions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Phishing is a new word produced from 'fishing', it refers to the act that the attacker allure users to visit a faked Web site by sending them faked e-mails (or instant messages), and stealthily get victim's personal information such as user name, password, and national security ID, etc. This information then can be used for future target advertisements or even identity theft attacks (e.g., transfer money from victims' bank account). The frequently used attack method is to send e-mails to potential victims, which seemed to be sent by banks, online organizations, or ISPs. In these e-mails, they will make up some causes, e.g. the password of your credit card had been mis-entered for many times, or they are providing upgrading services, to allure you visit their Web site to conform or modify your account number and password through the hyperlink provided in the e-mail (Leon, 2008).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Android is the most popular operating system of the latest mobile smart devices. With this operating system, many Android applications have been developed and become an essential part of our daily lives. Unfortunately, different kinds of Android malware have also been generated with these applications’ endless stream and somehow installed during the API calls, permission granted and extra packages installation and badly affected the system security rules to harm the system. Therefore, it is compulsory to detect and classify the android malware to save the user’s privacy to avoid maximum damages. Many research has already been developed on the different techniques related to android malware detection and classification. In this work, we present AMDDLmodel a deep learning technique that consists of a convolutional neural network. This model works based on different parameters, filter sizes, number of epochs, learning rates, and layers to detect and classify the android malware. The Drebin dataset consisting of 215 features was used for this model evaluation. The model shows an accuracy value of 99.92%. The other statistical values are precision, recall, and F1-score. AMDDLmodel introduces innovative deep learning for Android malware detection, enhancing accuracy and practical user security through inventive feature engineering and comprehensive performance evaluation. The AMDDLmodel shows the highest accuracy values as compared to the existing techniques.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Android Phones’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/khaiid/android-phones on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Android is the most used operating systems in the mobile phones field, it would be interesting to explore the different manufacturers and devices that uses it and which versions of Android operating system are widely used
The data has about 1300 rows including 4 attributes described as following:
Name: Mobile phone name Brand: Manufacturer brand name Release: Release date of the mobile Version: Android version of the mobile
How many phones use Android 11 ? Which phones were released the latest ? Which brand has the most phones released ? How many brands are there
This Data uses material from ( https://en.wikipedia.org/wiki/List_of_Android_smartphones ) which is released under the Creative Commons Attribution-Share-Alike License 3.0
--- Original source retains full ownership of the source dataset ---