MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.
Data Creation
The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.
We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.
Data Structure
The dataset has two main files.
myket.csv
: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero.app_info_sample.csv
: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.Dataset Details
For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.
Top 20 Most Installed Applications
Package Name | Count of Interactions |
---|---|
com.instagram.android | 15292 |
ir.resaneh1.iptv | 12143 |
com.tencent.ig | 7919 |
com.ForgeGames.SpecialForcesGroup2 | 7797 |
ir.nomogame.ClutchGame | 6193 |
com.dts.freefireth | 6041 |
com.whatsapp | 5876 |
com.supercell.clashofclans | 5817 |
com.mojang.minecraftpe | 5649 |
com.lenovo.anyshare.gps | 5076 |
ir.medu.shad | 4673 |
com.firsttouchgames.dls3 | 4641 |
com.activision.callofduty.shooter | 4357 |
com.tencent.iglite | 4126 |
com.aparat | 3598 |
com.kiloo.subwaysurf | 3135 |
com.supercell.clashroyale | 2793 |
co.palang.QuizOfKings | 2589 |
com.nazdika.app | 2436 |
com.digikala | 2413 |
Comparison with SNAP Datasets
The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:
Dataset | #Users | #Items | #Interactions | Average Interactions per User | Average Unique Items per User |
---|---|---|---|---|---|
Myket | 10,000 | 7,988 | 694,121 | 69.4 | 54.6 |
LastFM | 980 | 1,000 | 1,293,103 | 1,319.5 | 158.2 |
10,000 | 984 | 672,447 | 67.2 | 7.9 | |
Wikipedia | 8,227 | 1,000 | 157,474 | 19.1 | 2.2 |
MOOC | 7,047 | 97 | 411,749 | 58.4 | 25.3 |
The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.
Citation
If you use this dataset in your research, please cite the following preprint:
@misc{loghmani2023effect,
title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks},
author={Erfan Loghmani and MohammadAmin Fazli},
year={2023},
eprint={2308.06862},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Protection against ransomware is particularly relevant in systems running the Android operating system, due to its huge users' base and, therefore, its potential for monetization from the attackers. In "Extinguishing Ransomware - A Hybrid Approach to Android Ransomware Detection" (see references for details), we describe a hybrid (static + dynamic) malware detection method that has extremely good accuracy (100% detection rate, with false positive below 4%).
We release a dataset related to the dynamic detection part of the aforementioned methods and containing execution traces of ransomware Android applications, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 666 ransomware applications taken from the Heldroid project [https://github.com/necst/heldroid] (the app repository is unavailable at the moment). Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 20,000 stimuli were applied with a maximum execution time of 15 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.
In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:
ransom-per_app-csv.zip - features obtained by executing ransomware applications, one CSV per application
ransom-unified-csv.zip - features obtained by executing ransomware applications, only one CSV file
Mobile phone use logs from 279 Android phone users for an average duration of four weeks during summer 2016.
As COVID-19 continues to spread across the world, a growing number of malicious campaigns are exploiting the pandemic. It is reported that COVID-19 is being used in a variety of online malicious activities, including Email scam, ransomware and malicious domains. As the number of the afflicted cases continue to surge, malicious campaigns that use coronavirus as a lure are increasing. Malicious developers take advantage of this opportunity to lure mobile users to download and install malicious apps.
However, besides a few media reports, the coronavirus-themed mobile malware has not been well studied. Our community lacks of the comprehensive understanding of the landscape of the coronavirus-themed mobile malware, and no accessible dataset could be used by our researchers to boost COVID-19 related cybersecurity studies.
We make efforts to create a daily growing COVID-19 related mobile app dataset. By the time of mid-November, we have curated a dataset of 4,322 COVID-19 themed apps, and 611 of them are considered to be malicious. The number is growing daily and our dataset will update weekly. For more details, please visit https://covid19apps.github.io
This dataset includes the following files:
(1) covid19apps.xlsx
In this file, we list all the COVID-19 themed apps information, including apk file hashes, released date, package name, AV-Rank, etc.
(2)covid19apps.zip
We put the COVID-19 themed apps Apk samples in zip files . In order to reduce the size of a single file, we divide the sample into multiple zip files for storage. And the APK file name after the file SHA256.
If your papers or articles use our dataset, please use the following bibtex reference to cite our paper: https://arxiv.org/abs/2005.14619
(Accepted to Empirical Software Engineering)
@misc{wang2021virus,
title={Beyond the Virus: A First Look at Coronavirus-themed Mobile Malware},
author={Liu Wang and Ren He and Haoyu Wang and Pengcheng Xia and Yuanchun Li and Lei Wu and Yajin Zhou and Xiapu Luo and Yulei Sui and Yao Guo and Guoai Xu},
year={2021},
eprint={2005.14619},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
Protection against malware is particularly relevant on systems running the Android operating system, due to its huge use base and, therefore, its potential for monetization from the attackers.
Protection against malware is particularly relevant in systems running the Android operating system, due to its huge users’ base and, therefore, its potential for monetization from the attackers.
Dynamic malware detection has been widely adopted by the scientific community but not yet in practical applications.
We release DYNAMISM (Dynamic Analysis of Malware), a dataset containing execution traces of both benign and malicious applications running on Android OS, in order to facilitate further research as well as to facilitate the adoption of dynamic detection in practice. The dataset contains execution traces from 2,386 benign applications and 2,495 malicious applications taken from the Malware Genome Project repository [http://www.malgenomeproject.org] and from Drebin Dataset [https://www.sec.cs.tu-bs.de/~danarp/drebin/]. Execution records were obtained by running the applications, one at a time, on the Android emulator. For each application, a maximum of 2,000 stimuli were applied with a maximum execution time of 10 minutes. For most of the applications, all the stimuli could be applied in this timeframe. In some of the traces none of the two limits is reached due to emulator hiccups. Collected features are related to the memory and CPU usage, network interaction and system calls and their monitoring is performed with a period of two seconds. The Android emulator of the Android Software Development Kit for Android 4.0 (release 20140702) was used. To guarantee that the system was always in a mint condition when a new sample is started, thus avoiding possible interference (e.g., changed settings, running processes, and modifications of the operating system files) from previously run samples, the Android operating system was each time re-initialized before running each application. The application execution process was automated by means of a shell script that made use of Android Debug Bridge (adb) and that was run on a Linux PC. The Monkey application exerciser was used in the script as a generator of the aforementioned stimuli. The Monkey is a command-line tool that can be run on any emulator instance or on a device; it sends a pseudo-random stream of user events (stimuli) into the system, which acts as a stress test on the application software.
In this dataset, we provide both per-app CSV files as well as unified files, in which CSV files of single applications have been concatenated. The CSV files contain the features extracted from the raw execution record. The provided files are listed below:
benign-per_app-csv.zip - features obtained by executing benign applications, one CSV per application
benign-unified-csv.zip - features obtained by executing benign applications, only one CSV file
malicious-per_app-csv.zip - features obtained by executing malicious applications, one CSV per application
malicious-unified-csv.zip - features obtained by executing malicious applications, only one CSV file
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.
Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.
The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.
This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.
REFERENCES:
Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597
microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset
Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the Portuguese Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Portuguese language.
Dataset Contain & Diversity:Containing more than 2000 images, this Portuguese OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.
To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Portuguese text.
The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.
All these shopping lists were written and images were captured by native Portuguese people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.
Metadata:In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.
This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Portuguese text recognition models.
Update & Custom Collection:We are committed to continually expanding this dataset by adding more images with the help of our native Portuguese crowd community.
If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.
Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.
License:This image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Portuguese language. Your journey to improved language understanding and processing begins here.
This is the dataset of MACs of Bluetooth and Wi-Fi access points collected by the University of Illinois Movement (UIM) framework using Google Android phones. ; klara@illinois.edu
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
We are publishing a walking activity dataset including inertial and positioning information from 19 volunteers, including reference distance measured using a trundle wheel. The dataset includes a total of 96.7 Km walked by the volunteers, split into 203 separate tracks. The trundle wheel is of two types: it is either an analogue trundle wheel, which provides the total amount of meters walked in a single track, or it is a sensorized trundle wheel, which measures every revolution of the wheel, therefore recording a continuous incremental distance.
Each track has data from the accelerometer and gyroscope embedded in the phones, location information from the Global Navigation Satellite System (GNSS), and the step count obtained by the device. The dataset can be used to implement walking distance estimation algorithms and to explore data quality in the context of walking activity and physical capacity tests, fitness, and pedestrian navigation.
Methods
The proposed dataset is a collection of walks where participants used their own smartphones to capture inertial and positioning information. The participants involved in the data collection come from two sites. The first site is the Oxford University Hospitals NHS Foundation Trust, United Kingdom, where 10 participants (7 affected by cardiovascular diseases and 3 healthy individuals) performed unsupervised 6MWTs in an outdoor environment of their choice (ethical approval obtained by the UK National Health Service Health Research Authority protocol reference numbers: 17/WM/0355). All participants involved provided informed consent. The second site is at Malm ̈o University, in Sweden, where a group of 9 healthy researchers collected data. This dataset can be used by researchers to develop distance estimation algorithms and how data quality impacts the estimation.
All walks were performed by holding a smartphone in one hand, with an app collecting inertial data, the GNSS signal, and the step counting. On the other free hand, participants held a trundle wheel to obtain the ground truth distance. Two different trundle wheels were used: an analogue trundle wheel that allowed the registration of a total single value of walked distance, and a sensorized trundle wheel which collected timestamps and distance at every 1-meter revolution, resulting in continuous incremental distance information. The latter configuration is innovative and allows the use of temporal windows of the IMU data as input to machine learning algorithms to estimate walked distance. In the case of data collected by researchers, if the walks were done simultaneously and at a close distance from each other, only one person used the trundle wheel, and the reference distance was associated with all walks that were collected at the same time.The walked paths are of variable length, duration, and shape. Participants were instructed to walk paths of increasing curvature, from straight to rounded. Irregular paths are particularly useful in determining limitations in the accuracy of walked distance algorithms. Two smartphone applications were developed for collecting the information of interest from the participants' devices, both available for Android and iOS operating systems. The first is a web-application that retrieves inertial data (acceleration, rotation rate, orientation) while connecting to the sensorized trundle wheel to record incremental reference distance [1]. The second app is the Timed Walk app [2], which guides the user in performing a walking test by signalling when to start and when to stop the walk while collecting both inertial and positioning data. All participants in the UK used the Timed Walk app.
The data collected during the walk is from the Inertial Measurement Unit (IMU) of the phone and, when available, the Global Navigation Satellite System (GNSS). In addition, the step count information is retrieved by the sensors embedded in each participant’s smartphone. With the dataset, we provide a descriptive table with the characteristics of each recording, including brand and model of the smartphone, duration, reference total distance, types of signals included and additionally scoring some relevant parameters related to the quality of the various signals. The path curvature is one of the most relevant parameters. Previous literature from our team, in fact, confirmed the negative impact of curved-shaped paths with the use of multiple distance estimation algorithms [3]. We visually inspected the walked paths and clustered them in three groups, a) straight path, i.e. no turns wider than 90 degrees, b) gently curved path, i.e. between one and five turns wider than 90 degrees, and c) curved path, i.e. more than five turns wider than 90 degrees. Other features relevant to the quality of collected signals are the total amount of time above a threshold (0.05s and 6s) where, respectively, inertial and GNSS data were missing due to technical issues or due to the app going in the background thus losing access to the sensors, sampling frequency of different data streams, average walking speed and the smartphone position. The start of each walk is set as 0 ms, thus not reporting time-related information. Walks locations collected in the UK are anonymized using the following approach: the first position is fixed to a central location of the city of Oxford (latitude: 51.7520, longitude: -1.2577) and all other positions are reassigned by applying a translation along the longitudinal and latitudinal axes which maintains the original distance and angle between samples. This way, the exact geographical location is lost, but the path shape and distances between samples are maintained. The difference between consecutive points “as the crow flies” and path curvature was numerically and visually inspected to obtain the same results as the original walks. Computations were made possible by using the Haversine Python library.
Multiple datasets are available regarding walking activity recognition among other daily living tasks. However, few studies are published with datasets that focus on the distance for both indoor and outdoor environments and that provide relevant ground truth information for it. Yan et al. [4] introduced an inertial walking dataset within indoor scenarios using a smartphone placed in 4 positions (on the leg, in a bag, in the hand, and on the body) by six healthy participants. The reference measurement used in this study is a Visual Odometry System embedded in a smartphone that has to be worn at the chest level, using a strap to hold it. While interesting and detailed, this dataset lacks GNSS data, which is likely to be used in outdoor scenarios, and the reference used for localization also suffers from accuracy issues, especially outdoors. Vezovcnik et al. [5] analysed estimation models for step length and provided an open-source dataset for a total of 22 km of only inertial walking data from 15 healthy adults. While relevant, their dataset focuses on steps rather than total distance and was acquired on a treadmill, which limits the validity in real-world scenarios. Kang et al. [6] proposed a way to estimate travelled distance by using an Android app that uses outdoor walking patterns to match them in indoor contexts for each participant. They collect data outdoors by including both inertial and positioning information and they use average values of speed obtained by the GPS data as reference labels. Afterwards, they use deep learning models to estimate walked distance obtaining high performances. Their results share that 3% to 11% of the data for each participant was discarded due to low quality. Unfortunately, the name of the used app is not reported and the paper does not mention if the dataset can be made available.
This dataset is heterogeneous under multiple aspects. It includes a majority of healthy participants, therefore, it is not possible to generalize the outcomes from this dataset to all walking styles or physical conditions. The dataset is heterogeneous also from a technical perspective, given the difference in devices, acquired data, and used smartphone apps (i.e. some tests lack IMU or GNSS, sampling frequency in iPhone was particularly low). We suggest selecting the appropriate track based on desired characteristics to obtain reliable and consistent outcomes.
This dataset allows researchers to develop algorithms to compute walked distance and to explore data quality and reliability in the context of the walking activity. This dataset was initiated to investigate the digitalization of the 6MWT, however, the collected information can also be useful for other physical capacity tests that involve walking (distance- or duration-based), or for other purposes such as fitness, and pedestrian navigation.
The article related to this dataset will be published in the proceedings of the IEEE MetroXRAINE 2024 conference, held in St. Albans, UK, 21-23 October.
This research is partially funded by the Swedish Knowledge Foundation and the Internet of Things and People research center through the Synergy project Intelligent and Trustworthy IoT Systems.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set comprises experiment carried out considering Nine Android devices, each named Copelabs1, 2, 3, 4, 5, 6, 7, 8 and 12, respectively. These devices carried by people sharing the same affiliation during their daily routines (commuting between home and office, going to leisure activities, attending meetings in the office). All the data was collected each and every one minute. We set up experiments making use of Samsung Galaxy S3 devices. For each experiment, there is the following set of data files: * SocialProximity.dat has three columns: Timestamp, DeviceName, Encounter Duration, Average Encounter Duration, Social Strength (Per hour) and Social Strength(Per minute) towards * DistanceOutput.dat has three columns: Timestamp, DeviceName, and Distance towards * Microphone.dat has two columns: Timestamp, and Sound level(QUIET, NORMAL, ALERT, and NOISY) * PhysicalActivity.dat has two columns: Timestamp, and Activity as STATIONARY, WALKING, and RUNNING The experiment was conducted for the period of 12 days from 12th September to 23rd September 2016 . All devices were carried by users sharing affiliation and following their individual daily routines.
https://opensource.org/licenses/BSD-3-Clausehttps://opensource.org/licenses/BSD-3-Clause
- This archive contains the files submitted to the 2nd International
Workshop on Data: Acquisition To Analysis (DATA) at SenSys. Files
provided in this package are associated with the paper titled
"Dataset: User side acquisition of People-Centric Sensing in the
Internet-of-Things"
- Content of the package:
+ 1_beacon_table.pkl: The beacon table in Pickle format. It contains
20612286 data points where each data point represents a Bluetooth
beacon with 15 attributes as follows: <_id, host_id, ble_address,
sound_avg_peak, sound_max_peak, sound_count_over_thres_per_frame,
sound_avg_all, sound_avg_over_thres, temperature, humidity,
pressure, eco2_ppm, tvoc_ppb, rssi, timestamp>.
+ 2_device_description_table.pkl: The device description table
provides the mapping between a device's Bluetooth address and its
physical identity (device_id, description, type).
+ 3_checkin_table.pkl: The check-in table provides a timeseries of
user interactions with three Android tablets (i.e. tuples of
+ 4_sample_beacon_table.pkl: The sample beacon table in Pickle
format. It contains 1000 data points where each data point
represents a Bluetooth beacon with 15 attributes as follows: <_id,
host_id, ble_address, sound_avg_peak, sound_max_peak,
sound_count_over_thres_per_frame, sound_avg_all,
sound_avg_over_thres, temperature, humidity, pressure, eco2_ppm,
tvoc_ppb, rssi, timestamp>.
+ 5_sample_device_description_table.pkl: The sample device description
table provides the mapping between a device's Bluetooth address and
its physical identity (device_id, description, type).
+ 6_sample_checkin_table.pkl: The check-in table provides a
timeseries of user interactions with three Android tablets
(i.e. tuples of
+ print_table_heads.py: A Python script which fetches Pickle tables
as DataFrames and prints out the sample entries.
- ACM Reference Format: Chenguang Liu, Jie Hua, Tomasz Kalbarczyk,
Sangsu Lee, and Christine Julien. 2019. Dataset: User side
acquisition of People-Centric Sensing in the Internet-of-Things. In
The 2nd Workshop on Data Acquisition To Analysis(DATA’19), November
10, 2019, New York, NY, USA. ACM, New York, NY, USA, 3 pages.
https://doi.org/10.1145/3359427.3361914
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset used for paper: "A Recommender System of Buggy App Checkers for App Store Moderators", published on the International Conference on Mobile Software Engineering and Systems (MOBILESoft) in 2015.
Dataset Collection We built a dataset that consists of a random sample of Android app metadata and user reviews available on the Google Play Store on January and March 2014. Since the Google Play Store is continuously evolving (adding, removing and/or updating apps), we updated the dataset twice. The dataset D1 contains available apps in the Google Play Store in January 2014. Then, we created a new snapshot (D2) of the Google Play Store in March 2014.
The apps belong to the 27 different categories defined by Google (at the time of writing the paper), and the 4 predefined subcategories (free, paid, new_free, and new_paid). For each category-subcategory pair (e.g. tools-free, tools-paid, sports-new_free, etc.), we collected a maximum of 500 samples, resulting in a median number of 1.978 apps per category.
For each app, we retrieved the following metadata: name, package, creator, version code, version name, number of downloads, size, upload date, star rating, star counting, and the set of permission requests.
In addition, for each app, we collected up to a maximum of the latest 500 reviews posted by users in the Google Play Store. For each review, we retrieved its metadata: title, description, device, and version of the app. None of these fields were mandatory, thus several reviews lack some of these details. From all the reviews attached to an app, we only considered the reviews associated with the latest version of the app —i.e., we discarded unversioned and old-versioned reviews. Thus, resulting in a corpus of 1,402,717 reviews (2014 Jan.).
Dataset Stats Some stats about the datasets:
D1 (Jan. 2014) contains 38,781 apps requesting 7,826 different permissions, and 1,402,717 user reviews.
D2 (Mar. 2014) contains 46,644 apps and 9,319 different permission requests, and 1,361,319 user reviews.
Additional stats about the datasets are available here.
Dataset Description To store the dataset, we created a graph database with Neo4j. This dataset therefore consists of a graph describing the apps as nodes and edges. We chose a graph database because the graph visualization helps to identify connections among data (e.g., clusters of apps sharing similar sets of permission requests).
In particular, our dataset graph contains six types of nodes: - APP nodes containing metadata of each app, - PERMISSION nodes describing permission types, - CATEGORY nodes describing app categories, - SUBCATEGORY nodes describing app subcategories, - USER_REVIEW nodes storing user reviews. - TOPIC topics mined from user reviews (using LDA).
Furthermore, there are five types of relationships between APP nodes and each of the remaining nodes:
Dataset Files Info
Neo4j 2.0 Databases
googlePlayDB1-Jan2014_neo4j_2_0.rar
googlePlayDB2-Mar2014_neo4j_2_0.rar We provide two Neo4j databases containing the 2 snapshots of the Google Play Store (January and March 2014). These are the original databases created for the paper. The databases were created with Neo4j 2.0. In particular with the tool version 'Neo4j 2.0.0-M06 Community Edition' (latest version available at the time of implementing the paper in 2014).
Neo4j 3.5 Databases
googlePlayDB1-Jan2014_neo4j_3_5_28.rar
googlePlayDB2-Mar2014_neo4j_3_5_28.rar Currently, the version Neo4j 2.0 is deprecated and it is not available for download in the official Neo4j Download Center. We have migrated the original databases (Neo4j 2.0) to Neo4j 3.5.28. The databases can be opened with the tool version: 'Neo4j Community Edition 3.5.28'. The tool can be downloaded from the official Neo4j Donwload page.
In order to open the databases with more recent versions of Neo4j, the databases must be first migrated to the corresponding version. Instructions about the migration process can be found in the Neo4j Migration Guide.
First time the Neo4j database is connected, it could request credentials. The username and pasword are: neo4j/neo4j
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the Dutch Sticky Notes Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Dutch language.
Dataset Contain & Diversity:Containing more than 2000 images, this Dutch OCR dataset offers a wide distribution of different types of sticky note images. Within this dataset, you'll discover a variety of handwritten text, including quotes, sentences, and individual words on sticky notes. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.
To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Dutch text.
The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.
All these sticky notes were written and images were captured by native Dutch people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.
Metadata:In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.
This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Dutch text recognition models.
Update & Custom Collection:We are committed to continually expanding this dataset by adding more images with the help of our native Dutch crowd community.
If you require a customized OCR dataset containing sticky note images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.
Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.
License:This image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Leverage this sticky notes image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Dutch language. Your journey to improved language understanding and processing begins here.
===========================================================================================
In an era where digital threats continue to evolve and intensify, 📞📲🤙 ☎ Call (+1→315→805→0009)👈 cybersecurity solutions like Bitdefender have become essential tools for individuals and organizations alike. Whether it's protecting personal information, securing financial transactions, or guarding business systems, Bitdefender offers a robust suite of services to ensure optimal safety in the digital world. 📞📲🤙 ☎ Call (+1→315→805→0009)👈 However, to effectively manage and utilize these services, users must first access their accounts through the Bitdefender Central platform. This comprehensive article titled “How Do I 'Login Bitdefender Account'? 📞📲🤙 ☎ Call (+1→315→805→0009)👈A Simple Guide” provides a detailed 📞📲🤙 ☎ Call (+1→315→805→0009)👈 roadmap for seamlessly accessing your Bitdefender account and making the most of its features.
The Importance of Your Bitdefender Account Before diving into the step-by-step guidance, it’s important to understand why your Bitdefender account plays a pivotal role in your digital protection. Bitdefender Central is the central management hub that allows users to install security applications, 📞📲🤙 ☎ Call (+1→315→805→0009)👈 manage their devices, monitor real-time threats, update subscriptions, and access support. It’s the gateway to a secure, well-monitored digital environment.
Logging into your account is not just about accessing software; it's about gaining control over your entire digital security infrastructure. From this centralized location, you can track the health of your devices, 📞📲🤙 ☎ Call (+1→315→805→0009)👈 configure security settings, and even locate lost mobile devices using anti-theft tools.
Peacock Tv Login Peacock Tv Sign in Bitdefender Login Account Bitdefender Sign in Account Norton Login Norton Sign in
Devices and Platforms Supported by Bitdefender Central Bitdefender Central is compatible with a variety of platforms, including Windows, macOS, Android, and iOS. Whether you're using a desktop computer or a mobile device, you can access your account and manage your digital security from anywhere. 📞📲🤙 ☎ Call (+1→315→805→0009)👈 The ability to log in through both web browsers and mobile apps gives users the flexibility to stay protected on the go.
Users can install Bitdefender apps on multiple devices and manage them all from one place. The Bitdefender Central app, available for mobile devices, also allows access to your account using the same credentials, 📞📲🤙 ☎ Call (+1→315→805→0009)👈 ensuring that you are never far from your security dashboard.
Preparation Before Logging In When approaching the question “How Do I 'Login Bitdefender Account'? A Simple Guide”, it's crucial to ensure you're properly prepared for the process. Having the right information on hand will make the login experience smoother and more efficient. Here’s what you should have ready
The email address associated with your Bitdefender account
The correct password for your account
Access to your email or mobile device for verification if multi-factor authentication is enabled
A secure internet connection to prevent interruptions during login
Preparation minimizes the risk of login errors and ensures that you can access.
Step-by-Step Guide to Logging into Bitdefender Account via Web Browser To begin managing your digital security, open your preferred web browser. Type in the Bitdefender Central web address in the address bar. This action will redirect you to the official login page. Enter your registered email address and password in the designated fields. If you've enabled two-factor authentication, you will be prompted to enter a verification code sent to your mobile device or email.
After successfully entering the required information, click on the login button to access your Bitdefender Central dashboard. If this is your first time logging in on a new device or browser, you may be asked to verify your identity further for security purposes.
Once logged in, you will see an overview of your protected devices, active subscriptions, recent alerts, and available downloads. This central hub allows you to navigate through your security services with ease.
How to Use the Bitdefender Central App for Login For users who prefer mobile access, the Bitdefender Central app provides all the essential features in a streamlined format. Begin by downloading the app from the official app store on your Android or iOS device. Open the app and enter the same email and password associated with your Bitdefender account. Just like the web version, you may be prompted to enter a two-step verification code.
Once logged in, the mobile app allows you to monitor threats, manage devices, renew subscriptions, and contact support directly. The mobile interface is designed to be user-friendly and offers most of the functionalities found in the desktop dashboard.
What to Do If You Forget Your Password One common issue users face is forgetting their account password. If you're wondering, “How Do I 'Login Bitdefender Account'? A Simple Guide”, this section is particularly useful. On the login page, look for the “Forgot Password” option. Click on it, and you will be prompted to enter the email address associated with your account. After submitting your email, Bitdefender will send you a password reset link.
Open the email, click on the provided link, and follow the instructions to create a new password. Make sure your new password is strong and unique, combining upper and lowercase letters, numbers, and special characters. After resetting your password, return to the login page and use your updated credentials to access your account.
Common Login Issues and How to Fix Them If you’re still unable to log in, several issues could be responsible. Understanding these potential obstacles can help you resolve them quickly
Incorrect Email or Password: Double-check your spelling, and make sure there are no extra spaces.
Account Not Verified: Make sure you completed the email verification when you first signed up.
Two-Factor Authentication Failure: Ensure you have access to the correct device or method used for verification.
Browser Compatibility: Use updated browsers like Chrome, Firefox, Safari, or Edge for optimal performance.
Network Issues: A weak or unstable internet connection can prevent successful login.
Addressing these issues will increase your chances of logging in smoothly and without frustration.
Security Measures to Protect Your Account Securing your Bitdefender account should be a top priority. Enable two-factor authentication to add an extra layer of security. Avoid using easily guessed passwords or reusing credentials from other platforms. Always log out of your account when using public or shared computers.
Regularly update your password and monitor your login history within Bitdefender Central to detect any unusual activity. These proactive steps ensure that only you have access to your account and sensitive data.
Managing Your Subscriptions After Login Once you’ve successfully logged in, managing your subscription becomes effortless. The Bitdefender Central dashboard displays your active plans, renewal dates, and device coverage. You can add new devices, remove outdated ones, or renew your license directly from this platform.
For users with multi-device or family plans, Bitdefender Central allows easy sharing of security across different users by sending invite links or installation files.
This centralization makes it easy to stay on top of your cybersecurity needs without switching between different applications or platforms.
Additional features available post-login include VPN activation, identity theft protection, ransomware remediation tools, and web protection toggles. These options can be enabled or configured directly from the Central platform.
How to Contact Support Through Your Account If you encounter technical or account-related issues after logging in, Bitdefender Central offers built-in customer support options. You can access live chat, email support, or phone support directly from the dashboard. Each method connects you with knowledgeable representatives who can help resolve concerns, update account information, or guide you through more complex procedures.
The support section also includes a knowledge base filled with how-to guides, video tutorials, and FAQs. Many users find solutions through these self-help resources without needing to wait for an agent.
Importance of Logging In Regularly Regularly accessing your Bitdefender account ensures that you remain up to date on your system’s security status. The dashboard notifies you of expired subscriptions, potential threats, and system performance issues. Frequent logins also allow you to download the latest security patches and updates, keeping your protection at its peak.
Additionally, regular login habits reinforce account security, as any unauthorized attempts will be more noticeable to the user.
Final Thoughts on Bitdefender Login Successfully navigating the Bitdefender Central login process is the foundation of a secure digital experience. Whether you're a first-time user or a seasoned subscriber, knowing how to log in, manage
In 2022, smartphone vendors sold around 1.39 billion smartphones were sold worldwide, with this number forecast to drop to 1.34 billion in 2023.
Smartphone penetration rate still on the rise
Less than half of the world’s total population owned a smart device in 2016, but the smartphone penetration rate has continued climbing, reaching 78.05 percent in 2020. By 2025, it is forecast that almost 87 percent of all mobile users in the United States will own a smartphone, an increase from the 27 percent of mobile users in 2010.
Smartphone end user sales
In the United States alone, sales of smartphones were projected to be worth around 73 billion U.S. dollars in 2021, an increase from 18 billion dollars in 2010. Global sales of smartphones are expected to increase from 2020 to 2021 in every major region, as the market starts to recover from the initial impact of the coronavirus (COVID-19) pandemic.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mobile devices especially smartphones have gained high popularity and become a part of daily life in recent years. Smartphones have built-in motion sensors such as accelerometer, gyroscope and orientation sensors. Recent researches on smartphones show that behavioral biometrics can be obtained from the smartphone motion sensors. In this context, we develop an Android application that collects accelerometer sensor data while user playing a game. This application records all accelerometer data and touch event information while users touch the screen. We perform two experiments and collect two different data using this application. In the first experiment, we collect data from 107 child users whose age vary from 4 to 11, and 100 adult users whose age are between 16 and 55. This dataset includes more than 11.000 taps data for child and adult users, in total. In the second experiment, data is collected from 60 female and 60 male users aged 17-57 for different activities like sitting and walking. There are more than 6.000 taps data for sitting and walking scenarios separately in the second dataset. We use popular Android smartphones in the experiments and they have all 100 Hz sampling rate. These data can be used for behavioral biometric analyses such as user age group and gender detection, user identification and authentication or tap event detection
This is a GPS dataset acquired from Google.
Google tracks the user’s device location through Google Maps, which also works on Android devices, the iPhone, and the web.
It’s possible to see the Timeline from the user’s settings in the Google Maps app on Android or directly from the Google Timeline Website.
It has detailed information such as when an individual is walking, driving, and flying.
Such functionality of tracking can be enabled or disabled on demand by the user directly from the smartphone or via the website.
Google has a Take Out service where the users can download all their data or select from the Google products they use the data they want to download.
The dataset contains 120,847 instances from a period of 9 months or 253 unique days from February 2019 to October 2019 from a single user.
The dataset comprises a pair of (latitude, and longitude), and a timestamp.
All the data was delivered in a single CSV file.
As the locations of this dataset are well known by the researchers, this dataset will be used as ground truth in many mobility studies.
Please cite the following papers in order to use the datasets:
T. Andrade, B. Cancela, and J. Gama, "Discovering locations and habits from human mobility data," Annals of Telecommunications, vol. 75, no. 9, pp. 505–521, 2020.
10.1007/s12243-020-00807-x (DOI)
and
T. Andrade, B. Cancela, and J. Gama, "From mobility data to habits and common pathways," Expert Systems, vol. 37, no. 6, p. e12627, 2020.
10.1111/exsy.12627 (DOI)
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Paper: https://arxiv.org/abs/2302.01751 Code: https://github.com/SamsungLabs/MotionID
Dataset (part 2/3) for Motion Patterns Identification part of MotionID: Human Authentication Approach. Data type: bin (should be converted by attached notebook).
Six users each with a Samsung Galaxy S10e smartphone collected IMU data every day for 2 weeks. At the end of two weeks, the users switched smartphones with each other and restarted the process. Each user spent 2 weeks per smartphone during the whole data collection process, which took 12 weeks in total. Throughout the experiment, the Galaxy S10e was the main and only device of each user. The smartphones were used habitually and ordinarily, with the only difference from real-life scenarios being that the data collection app was always on.
Each measurement has a corresponding timestamp. Screen.txt file consists of timestamps with current status of device: 1) SCREEN_OFF - a phone turned off 2) SCREEN_ON - a phone is on 3) USER_PRESENT - a phone has just been unlocked Details here: https://developer.android.com/reference/android/content/Intent
Links to other datasets of MotionID: User Verification: https://www.kaggle.com/datasets/djaarf/motionid-imu-specific-motion Motion Patterns Identification Part 1/3: https://www.kaggle.com/datasets/djaarf/motionid-imu-all-motions-part1 Motion Patterns Identification Part 3/3: https://www.kaggle.com/datasets/djaarf/motionid-imu-all-motions-part3
Infant Crying Smartphone speech dataset, collected by Android smartphone and iPhone, covering infant crying. Our dataset was collected from extensive and diversify speakers(201 people in total, with balanced gender distribution), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I got the chance to walk across the Golden Gate Bridge in San Francisco, CA for the first time in 22 years on May 12, 2018. There have been a great many technological advancements since then, as now we are all walking around with powerful computers and sensors in our pockets. I decided it would be fun to measure the bridge and provide others the opportunity to analyze data as to its motion for a brief snippet of time.
This is one minute of data from the "g-force Meter" of the Physics Toolbox Suite v1.8.6 for Android. The data was collected from a Pixel 2 phone on the east side of the Golden Gate Bridge at the midpoint between the two towers of the bridge at approximately 3:15 PM local time on May 12, 2018.
This dataset is hereby owned by the community under the terms of a very lenient license in the condition that I have published it shortly after recording it.
Maybe this will inspire people to install sensors on the bridge, and other bridges, to monitor for things such as traffic (such as to find an optimum speed limit), dangerous fatigue, or dangerous wind conditions. At the very least, one could use this in comparison with a baseline stable motion to see how the bridge shakes. One could study effects of vehicles traversing the bridge (not that there's any visual data for when that happened relative to this dataset, but I do believe at least one big bus drove by my device during this recording). One could study if there are periodic vibrations, and if so, at what frequencies. This would be even more interesting if correlated with wind data and run compared to several different wind speeds.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.
Data Creation
The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.
We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.
Data Structure
The dataset has two main files.
myket.csv
: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero.app_info_sample.csv
: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.Dataset Details
For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.
Top 20 Most Installed Applications
Package Name | Count of Interactions |
---|---|
com.instagram.android | 15292 |
ir.resaneh1.iptv | 12143 |
com.tencent.ig | 7919 |
com.ForgeGames.SpecialForcesGroup2 | 7797 |
ir.nomogame.ClutchGame | 6193 |
com.dts.freefireth | 6041 |
com.whatsapp | 5876 |
com.supercell.clashofclans | 5817 |
com.mojang.minecraftpe | 5649 |
com.lenovo.anyshare.gps | 5076 |
ir.medu.shad | 4673 |
com.firsttouchgames.dls3 | 4641 |
com.activision.callofduty.shooter | 4357 |
com.tencent.iglite | 4126 |
com.aparat | 3598 |
com.kiloo.subwaysurf | 3135 |
com.supercell.clashroyale | 2793 |
co.palang.QuizOfKings | 2589 |
com.nazdika.app | 2436 |
com.digikala | 2413 |
Comparison with SNAP Datasets
The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:
Dataset | #Users | #Items | #Interactions | Average Interactions per User | Average Unique Items per User |
---|---|---|---|---|---|
Myket | 10,000 | 7,988 | 694,121 | 69.4 | 54.6 |
LastFM | 980 | 1,000 | 1,293,103 | 1,319.5 | 158.2 |
10,000 | 984 | 672,447 | 67.2 | 7.9 | |
Wikipedia | 8,227 | 1,000 | 157,474 | 19.1 | 2.2 |
MOOC | 7,047 | 97 | 411,749 | 58.4 | 25.3 |
The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.
Citation
If you use this dataset in your research, please cite the following preprint:
@misc{loghmani2023effect,
title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks},
author={Erfan Loghmani and MohammadAmin Fazli},
year={2023},
eprint={2308.06862},
archivePrefix={arXiv},
primaryClass={cs.LG}
}