9 datasets found

Network Traffic Android Malware
kaggle.com
zip
Updated Sep 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Urcuqui (2019). Network Traffic Android Malware [Dataset]. https://www.kaggle.com/datasets/xwolf12/network-traffic-android-malware
Explore at:
zip(116603 bytes)Available download formats
Dataset updated
Sep 12, 2019
Authors
Christian Urcuqui
Description
Introduction

Android is one of the most used mobile operating systems worldwide. Due to its technological impact, its open-source code and the possibility of installing applications from third parties without any central control, Android has recently become a malware target. Even if it includes security mechanisms, the last news about malicious activities and Android´s vulnerabilities point to the importance of continuing the development of methods and frameworks to improve its security.

To prevent malware attacks, researches and developers have proposed different security solutions, applying static analysis, dynamic analysis, and artificial intelligence. Indeed, data science has become a promising area in cybersecurity, since analytical models based on data allow for the discovery of insights that can help to predict malicious activities.

In this work, we propose to consider some network layer features as the basis for machine learning models that can successfully detect malware applications, using open datasets from the research community.

Content

This dataset is based on another dataset (DroidCollector) where you can get all the network traffic in pcap files, in our research we preprocessed the files in order to get network features that are illustrated in the next article:

López, C. C. U., Villarreal, J. S. D., Belalcazar, A. F. P., Cadavid, A. N., & Cely, J. G. D. (2018, May). Features to Detect Android Malware. In 2018 IEEE Colombian Conference on Communications and Computing (COLCOM) (pp. 1-6). IEEE.

Acknowledgements

Cao, D., Wang, S., Li, Q., Cheny, Z., Yan, Q., Peng, L., & Yang, B. (2016, August). DroidCollector: A High Performance Framework for High Quality Android Traffic Collection. In Trustcom/BigDataSE/I SPA, 2016 IEEE (pp. 1753-1758). IEEE

Myket Android Application Install Dataset

paperswithcode.com

Updated Aug 12, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Erfan Loghmani; Mohammadamin Fazli (2023). Myket Android Application Install Dataset [Dataset]. https://paperswithcode.com/dataset/myket-android-application-install

Explore at:

Dataset updated

Aug 12, 2023

Authors

Erfan Loghmani; Mohammadamin Fazli

Description

This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.

Data Creation The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.

We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.

Data Structure The dataset has two main files.

myket.csv: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero. app_info_sample.csv: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.

Dataset Details

Total Instances: 694,121 install interaction instances Instances Format: Triplets of user_id, app_name, timestamp 10,000 users and 7,988 android applications Item features for 7,606 applications

For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.

Top 20 Most Installed Applications | Package Name | Count of Interactions | | ---------------------------------- | --------------------- | | com.instagram.android | 15292 | | ir.resaneh1.iptv | 12143 | | com.tencent.ig | 7919 | | com.ForgeGames.SpecialForcesGroup2 | 7797 | | ir.nomogame.ClutchGame | 6193 | | com.dts.freefireth | 6041 | | com.whatsapp | 5876 | | com.supercell.clashofclans | 5817 | | com.mojang.minecraftpe | 5649 | | com.lenovo.anyshare.gps | 5076 | | ir.medu.shad | 4673 | | com.firsttouchgames.dls3 | 4641 | | com.activision.callofduty.shooter | 4357 | | com.tencent.iglite | 4126 | | com.aparat | 3598 | | com.kiloo.subwaysurf | 3135 | | com.supercell.clashroyale | 2793 | | co.palang.QuizOfKings | 2589 | | com.nazdika.app | 2436 | | com.digikala | 2413 |

Comparison with SNAP Datasets The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:

Dataset	#Users	#Items	#Interactions	Average Interactions per User	Average Unique Items per User
Myket	10,000	7,988	694,121	69.4	54.6
LastFM	980	1,000	1,293,103	1,319.5	158.2
Reddit	10,000	984	672,447	67.2	7.9
Wikipedia	8,227	1,000	157,474	19.1	2.2
MOOC	7,047	97	411,749	58.4	25.3

The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.

Citation If you use this dataset in your research, please cite the following preprint:

@misc{loghmani2023effect, title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks}, author={Erfan Loghmani and MohammadAmin Fazli}, year={2023}, eprint={2308.06862}, archivePrefix={arXiv}, primaryClass={cs.LG} }

i
The icsi/netalyzr-android dataset
impactcybertrust.org
Updated Jan 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Data Source (2019). The icsi/netalyzr-android dataset [Dataset]. http://doi.org/10.23721/100/1478847
Explore at:
Unique identifier
https://doi.org/10.23721/100/1478847
Dataset updated
Jan 21, 2019
Authors
External Data Source
Description
This dataset was collected by the ICSI Netalyzr app for Android to develop a characterization of how operational decisions, such as network configurations, business models, and relationships between operators introduce diversity in service quality and affect user security and privacy. We delve in detail beyond the radio link and into network configuration and business relationships in six countries. We identify the widespread use of transparent middleboxes such as HTTP and DNS proxies, analyzing how they actively modify user traffic, compromise user privacy, and potentially undermine user security. In addition, we identify network sharing agreements between operators, highlighting the implications of roaming and characterizing the properties of MVNOs, including that a majority are simply rebranded versions of major operators. More broadly, our findings using this data highlight the importance of considering higher-layer relationships when seeking to analyze mobile traffic in a sound fashion. ; narseo@icsi.berkeley.edu
m
ITC-Net-MingledApp: A comprehensive dataset of mixed mobile application...
data.mendeley.com
Updated Oct 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abolghasem Rezaei Khesal (2024). ITC-Net-MingledApp: A comprehensive dataset of mixed mobile application traffic for robust network traffic classification, domain adaptation, and generalization in diverse environments - Tehran Dataset #1 [Dataset]. http://doi.org/10.17632/9frgkybxhn.1
Explore at:
Unique identifier
https://doi.org/10.17632/9frgkybxhn.1
Dataset updated
Oct 7, 2024
Authors
Abolghasem Rezaei Khesal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Tehran
Description
This repository is part of the ITC-NetMingledApp dataset, which includes network traffic data from 36 Android applications, with each capture featuring concurrent traffic from multiple applications and smartphones. This repository contains part #1 of the data related to the Iran-Tehran scenario. Each capture is stored in a compressed file containing the relevant PCAP files of the associated applications. The PCAP files are named according to a convention: {TimeStamp}_{Application Name}{Download-Upload Speed}.pcap Part #2 of Iran-Tehran scenario is in the Tehran Dataset #2 (https://doi.org/10.17632/zsffy3j9y6.1) repository.
Brainport, Platooning, formation improvement by traffic light data
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TNO; TASS; NXP; Technolution; NEVS (National Electric Vehicle Sweden); TNO; TASS; NXP; Technolution; NEVS (National Electric Vehicle Sweden) (2020). Brainport, Platooning, formation improvement by traffic light data [Dataset]. http://doi.org/10.5281/zenodo.3606608
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3606608
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
TNO; TASS; NXP; Technolution; NEVS (National Electric Vehicle Sweden); TNO; TASS; NXP; Technolution; NEVS (National Electric Vehicle Sweden)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scenario description:

Platoon formation with live traffic light data included in planner.
- Enabled live traffic light data included in planner
- Not using the Android app
- Starting at default locations
- This test was filmed, including the GUI.

Session description:

Platoon formation improvement by traffic light data.

Datasets descriptions:

AUTOPILOT_BrainPort_Platooning_DriverVehicleInteraction: Data extracted from the CAN of the vehicle

This dataset contains e.g. throttlestatus, clutchstatus, brakestatus, brakeforce, wipersstatus, steeringwheel for the vehicle

AUTOPILOT_BrainPort_Platooning_EnvironmentSensorsAbsolute: Data extracted from the vehicle environment sensors

This dataset contains information about detected object, with absolute coordinates

AUTOPILOT_BrainPort_Platooning_EnvironmentSensorsRelative: Data extracted from the vehicle environment sensors

This dataset contains information about detected object, with relative coordinates

AUTOPILOT_BrainPort_Platooning_IotVehicleMessage: Data sent between all devices, vehicles and services

Each sensor data submission is a Message. A Message has an Envelope, a Path, and optionally (but likely) Path Events and optionally Path Media. The envelope bears fundamental information about the individual sender (the vehicle) but not to a level that owner of the vehicle can be identified or different messages can be identified that originate from a single vehicle.

AUTOPILOT_BrainPort_Platooning_PlatoonFormation: Data sent from PlatoonService to vehicle

This dataset contains information about the route and speed for a specific vehicle for forming a platoon

AUTOPILOT_BrainPort_Platooning_PlatooningAction: Data logged by vehicle

This dataset contains information about the current status of the platooning

AUTOPILOT_BrainPort_Platooning_PlatooningEvent: Data logged by vehicle

This dataset contains information about the identifiers used for each specific platooning event

AUTOPILOT_BrainPort_Platooning_PlatoonStatus: Data sent by vehicle to PlatoonService

This dataset contains information about the current status of the platooning

AUTOPILOT_BrainPort_Platooning_PositioningSystem: Data from GPS on the vehicle

This dataset contains speed, longitude, latitude, heading from the GPS

AUTOPILOT_BrainPort_Platooning_PositioningSystemResample: Data from GPS on the vehicle

This dataset contains speed,longitude,latitude,heading from the GPS, resampled to 100 milliseconds

AUTOPILOT_BrainPort_Platooning_PSInfo: Data sent by PlatoonService to the vehicle

This dataset contains speed and route information for the vehicle to create a platoon

AUTOPILOT_BrainPort_Platooning_Target: Data from sensors on the vehicle

Target detection in the vicinity of the host vehicle, by a vehicle sensor or virtual sensor

AUTOPILOT_BrainPort_Platooning_Vehicle: Data from the CAN and sensors about the state of the vehicle

This dataset contains a.o temperature and battery state of the vehicles

AUTOPILOT_BrainPort_Platooning_VehicleDynamics: Data from the CAN and sensors about the state of the vehicle

This dataset contains a.o accelerations and speedlimit of the vehicle, as observed from the CAN and the external sensors

AUTOPILOT_BrainPort_Platooning_VehicleDynamics: Data from the CAN and sensors about the state of the vehicle

This dataset contains a.o accelerations and speedlimit of the vehicle, as observed from the CAN and the external sensors
Brainport, Platooning, platoon with live traffic light
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TNO; TASS; NXP; Technolution; NEVS (National Electric Vehicle Sweden); TNO; TASS; NXP; Technolution; NEVS (National Electric Vehicle Sweden) (2020). Brainport, Platooning, platoon with live traffic light [Dataset]. http://doi.org/10.5281/zenodo.3606589
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3606589
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
TNO; TASS; NXP; Technolution; NEVS (National Electric Vehicle Sweden); TNO; TASS; NXP; Technolution; NEVS (National Electric Vehicle Sweden)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scenario description:

Platoon formation and platooning, from Helmond to Eindhoven and back to the Automotive Campus.
- Starting in urban area with speed limits of 15 and 30 km/h.
- Driving East on the Europaweg with speed limits of 50 and 70 km/h. This includes 3 crossings with traffic lights.
- Driving on the the N270, along the Automotive Campus. One crossing with traffic lights, just before the A270.
- Driving on the A270 (speed limit 100 km/h). Interrupted by one traffic light.
- U-turn at the fly-over or at the end of the A270, to return the same way to the Automotive Campus.

Session description:

Platoon formation and platooning, with live traffic light data included in planner.
- Live traffic light data available for planner
- Driver uses the Android app
- Starting at default locations
- Platooning (CACC and lane keeping) on the A270 when possible.

Datasets descriptions:

AUTOPILOT_BrainPort_Platooning_DriverVehicleInteraction: Data extracted from the CAN of the vehicle

This dataset contains e.g. throttlestatus, clutchstatus, brakestatus, brakeforce, wipersstatus, steeringwheel for the vehicle

AUTOPILOT_BrainPort_Platooning_EnvironmentSensorsAbsolute: Data extracted from the vehicle environment sensors

This dataset contains information about detected object, with absolute coordinates

AUTOPILOT_BrainPort_Platooning_EnvironmentSensorsRelative: Data extracted from the vehicle environment sensors

This dataset contains information about detected object, with relative coordinates

AUTOPILOT_BrainPort_Platooning_IotVehicleMessage: Data sent between all devices, vehicles and services

Each sensor data submission is a Message. A Message has an Envelope, a Path, and optionally (but likely) Path Events and optionally Path Media. The envelope bears fundamental information about the individual sender (the vehicle) but not to a level that owner of the vehicle can be identified or different messages can be identified that originate from a single vehicle.

AUTOPILOT_BrainPort_Platooning_PlatoonFormation: Data sent from PlatoonService to vehicle

This dataset contains information about the route and speed for a specific vehicle for forming a platoon

AUTOPILOT_BrainPort_Platooning_PlatooningAction: Data logged by vehicle

This dataset contains information about the current status of the platooning

AUTOPILOT_BrainPort_Platooning_PlatooningEvent: Data logged by vehicle

This dataset contains information about the identifiers used for each specific platooning event

AUTOPILOT_BrainPort_Platooning_PlatoonStatus: Data sent by vehicle to PlatoonService

This dataset contains information about the current status of the platooning

AUTOPILOT_BrainPort_Platooning_PositioningSystem: Data from GPS on the vehicle

This dataset contains speed, longitude, latitude, heading from the GPS

AUTOPILOT_BrainPort_Platooning_PositioningSystemResample: Data from GPS on the vehicle

This dataset contains speed,longitude,latitude,heading from the GPS, resampled to 100 milliseconds

AUTOPILOT_BrainPort_Platooning_PSInfo: Data sent by PlatoonService to the vehicle

This dataset contains speed and route information for the vehicle to create a platoon

AUTOPILOT_BrainPort_Platooning_Target: Data from sensors on the vehicle

Target detection in the vicinity of the host vehicle, by a vehicle sensor or virtual sensor

AUTOPILOT_BrainPort_Platooning_Vehicle: Data from the CAN and sensors about the state of the vehicle

This dataset contains a.o temperature and battery state of the vehicles

AUTOPILOT_BrainPort_Platooning_VehicleDynamics: Data from the CAN and sensors about the state of the vehicle

This dataset contains a.o accelerations and speedlimit of the vehicle, as observed from the CAN and the external sensors

AUTOPILOT_BrainPort_Platooning_VehicleDynamics: Data from the CAN and sensors about the state of the vehicle

This dataset contains a.o accelerations and speedlimit of the vehicle, as observed from the CAN and the external sensors
f
Potential weaknesses and other security issues per app.
figshare.com
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vasileios Kouliaridis; Georgios Kambourakis; Efstratios Chatzoglou; Dimitrios Geneiatakis; Hua Wang (2023). Potential weaknesses and other security issues per app. [Dataset]. http://doi.org/10.1371/journal.pone.0251867.t006
Explore at:
Unique identifier
https://doi.org/10.1371/journal.pone.0251867.t006
Dataset updated
Jun 11, 2023
Dataset provided by
PLOS ONE
Authors
Vasileios Kouliaridis; Georgios Kambourakis; Efstratios Chatzoglou; Dimitrios Geneiatakis; Hua Wang
Description
Potential weaknesses and other security issues per app.
C
CoronaMelder Statistics
ckan.mobidatalab.eu
Updated Jul 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OverheidNl (2023). CoronaMelder Statistics [Dataset]. https://ckan.mobidatalab.eu/dataset/coronamelder-statistieken
Explore at:
http://publications.europa.eu/resource/authority/file-type/csvAvailable download formats
Dataset updated
Jul 13, 2023
Dataset provided by
OverheidNl
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Description
In this table you will find information about CoronaMelder. This concerns two variables: 1. The number of people who downloaded CoronaMelder 2. The number of people who warned others via CoronaMelder 1. The number of downloads is based on data from: - App Store (iOS) - Play Store (Android) - Huawei App Gallery (Android) 2. If you have tested positive for corona, you can voluntarily indicate this in the app, together with an employee of the GGD. The numbers show how many people have done this.
i
cuckoo
impactcybertrust.org
search.datacite.org
Updated Jun 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Data Source (2019). cuckoo [Dataset]. http://doi.org/10.23721/100/1503942
Explore at:
Unique identifier
https://doi.org/10.23721/100/1503942
Dataset updated
Jun 15, 2019
Authors
External Data Source
Description
Cuckoo Sandbox is the leading open sourceautomated malware analysis system. You can throw any suspicious file atit and in a matter of seconds Cuckoo will provide you back some detailedresults outlining what such file did when executed inside an isolatedenvironment.

Cuckoo Sandbox is free software that automated the task of analyzing any malicious file under Windows, OS X, Linux, and Android.

What can it do?

Cuckoo Sandbox is an advanced, extremely modular, and 100% open source automated malware analysis system with infinite application opportunities. By default it is able to:

Analyze many different malicious files (executables, office documents, pdf files, emails, etc) as well as malicious websites under Windows, Linux, Mac OS X, and Android virtualized environments.
Trace API calls and general behavior of the file and distill this into high level information and signatures comprehensible by anyone.
Dump and analyze network traffic, even when encrypted with SSL/TLS. With native network routing support to drop all traffic or route it through InetSIM, a network interface, or a VPN.
Perform advanced memory analysis of the infected virtualized system through Volatility as well as on a process memory granularity using YARA.

Due to Cuckoo s open source nature and extensive modular design one may customize any aspect of the analysis environment, analysis results processing, and reporting stage. Cuckoo provides you all the requirements to easily integrate the sandbox into your existing framework and backend in the way you want, with the format you want, and all of that without licensing requirements.

.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Christian Urcuqui (2019). Network Traffic Android Malware [Dataset]. https://www.kaggle.com/datasets/xwolf12/network-traffic-android-malware

Network Traffic Android Malware

Malware and Benign Apps

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

zip(116603 bytes)Available download formats

Dataset updated

Sep 12, 2019

Authors

Christian Urcuqui

Description

Introduction

Android is one of the most used mobile operating systems worldwide. Due to its technological impact, its open-source code and the possibility of installing applications from third parties without any central control, Android has recently become a malware target. Even if it includes security mechanisms, the last news about malicious activities and Android´s vulnerabilities point to the importance of continuing the development of methods and frameworks to improve its security.

To prevent malware attacks, researches and developers have proposed different security solutions, applying static analysis, dynamic analysis, and artificial intelligence. Indeed, data science has become a promising area in cybersecurity, since analytical models based on data allow for the discovery of insights that can help to predict malicious activities.

In this work, we propose to consider some network layer features as the basis for machine learning models that can successfully detect malware applications, using open datasets from the research community.

Content

This dataset is based on another dataset (DroidCollector) where you can get all the network traffic in pcap files, in our research we preprocessed the files in order to get network features that are illustrated in the next article:

López, C. C. U., Villarreal, J. S. D., Belalcazar, A. F. P., Cadavid, A. N., & Cely, J. G. D. (2018, May). Features to Detect Android Malware. In 2018 IEEE Colombian Conference on Communications and Computing (COLCOM) (pp. 1-6). IEEE.

Acknowledgements

Cao, D., Wang, S., Li, Q., Cheny, Z., Yan, Q., Peng, L., & Yang, B. (2016, August). DroidCollector: A High Performance Framework for High Quality Android Traffic Collection. In Trustcom/BigDataSE/I SPA, 2016 IEEE (pp. 1753-1758). IEEE

Clear search

Close search

Google apps

Main menu

Network Traffic Android Malware

Introduction

Content

Acknowledgements

Myket Android Application Install Dataset

The icsi/netalyzr-android dataset

ITC-Net-MingledApp: A comprehensive dataset of mixed mobile application...

Brainport, Platooning, formation improvement by traffic light data

Brainport, Platooning, platoon with live traffic light

Potential weaknesses and other security issues per app.

CoronaMelder Statistics

cuckoo

Network Traffic Android Malware

Malware and Benign Apps

Introduction

Content

Acknowledgements