Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.
This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.
It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.
Related dataset
Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.
Measurement setup
The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device).
Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.
The following information about each received PR is collected:
- MAC address
- Supported data rates
- extended supported rates
- HT capabilities
- extended capabilities
- data under extended tag and vendor specific tag
- interworking
- VHT capabilities
- RSSI
- SSID
- timestamp when PR was received.
The collected data was forwarded to a remote database via a secure VPN connection.
A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.
Data preprocessing
The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database.
For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:
PR_IE_data =
{
'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext},
'HT_CAP': DATA_htcap,
'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap},
'VHT_CAP': DATA_vhtcap,
'INTERWORKING': DATA_inter,
'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...},
'VENDOR_SPEC': {VENDOR_1:{
'ID_1': DATA_1_vendor1,
'ID_2': DATA_2_vendor1
...},
VENDOR_2:{
'ID_1': DATA_1_vendor2,
'ID_2': DATA_2_vendor2
...}
...}
}
Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.
When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:
{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },
where PR_data is structured as follows:
{
'TIME': [ DATA_time ],
'RSSI': [ DATA_rssi ],
'DATA': PR_IE_data
}.
This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored.
The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval.
If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended.
If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key.
The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png
At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.
Folder structure
For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period.
Each folder contains four files, each containing samples from that device.
The folders are named after the start and end time (in UTC).
For example, the folder [2022-09-22T22-00-00_2022-09-23T22-00-00](2022-09-22T22-00-00_2022-09-23T22-00-00) contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.
Files representing their location via mapping:
- 1.json -> location 1
- 2.json -> location 2
- 3.json -> location 3
- 4.json -> location 4
Environments description
The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo
The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset.
As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system.
Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.
Four Raspbery Pi-s were used:
- location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano)
- location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo
- location 3 -> nothernmost window in the building of Via Etnea near Piazza Università
- location 4 -> first window top the right of the entrance of the University of Catania
Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access)
Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.
Known dataset shortcomings
Due to technical and physical limitations, the dataset contains some identified deficiencies.
PRs are collected and transmitted in 10-second chunks.
Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.
Every 20 minutes the service is restarted on the recording device.
This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond.
For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.
The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.
Location 1 - Piazza del Duomo - Chierici
The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period.
Its location is constant and is not disturbed, dataset seems to have complete coverage.
Location 2 - Via Etnea - Piazza del Duomo
The device is located inside the building.
During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed.
As the device was moved back and forth, power outages and internet connection issues occurred.
The last three days in the record contain no PRs from this location.
Location 3 - Via Etnea - Piazza Università
Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building.
Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present.
This device appears to have been collecting data throughout the whole dataset period.
Location 4 - Piazza Università
This location is wirelessly connected to the access point.
The device was placed statically on a windowsill overlooking the square.
Due to physical limitations, the device had lost power several times during the deployment.
The internet connection was also interrupted sporadically.
Recognitions
The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The 802.11 standard includes several management features and corresponding frame types. One of them are probe requests (PR). They are sent by mobile devices in the unassociated state to search the nearby area for existing wireless networks. The frame part of PRs consists of variable length fields called information elements (IE). IE fields represent the capabilities of a mobile device, such as data rates.
The dataset includes PRs collected in a controlled rural environment and in a semi-controlled indoor environment under different measurement scenarios.
It can be used for various use cases, e.g., analysing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analysing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.
Measurement setup
The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture Wi-Fi signal traffic in monitoring mode. Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.
The following information about each PR received is collected: MAC address, Supported data rates, extended supported rates, HT capabilities, extended capabilities, data under extended tag and vendor specific tag, interworking, VHT capabilities, RSSI, SSID and timestamp when PR was received.
The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package for data collection, preprocessing and transmission.
Data preprocessing
The gateway collects PRs for each consecutive predefined scan interval (10 seconds). During this time interval, the data are preprocessed before being transmitted to the database.
For each detected PR in the scan interval, IEs fields are saved in the following JSON structure:
PR_IE_data =
{
'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext},
'HT_CAP': DATA_htcap,
'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap},
'VHT_CAP': DATA_vhtcap,
'INTERWORKING': DATA_inter,
'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...},
'VENDOR_SPEC': {VENDOR_1:{
'ID_1': DATA_1_vendor1,
'ID_2': DATA_2_vendor1
...},
VENDOR_2:{
'ID_1': DATA_1_vendor2,
'ID_2': DATA_2_vendor2
...}
...}
}
Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.
When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:
{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },
where PR_data is structured as follows:
{
'TIME': [ DATA_time ],
'RSSI': [ DATA_rssi ],
'DATA': PR_IE_data
}.
This data structure allows storing only TOA and RSSI for all PRs originating from the same MAC address and containing the same PR_IE_data. All SSIDs from the same MAC address are also stored.
The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval.
If identical PR's IE data from the same MAC address is already stored, then only data for the keys TIME and RSSI are appended.
If no identical PR's IE data has yet been received from the same MAC address, then PR_data structure of the new PR for that MAC address is appended to PROBE_REQs key.
The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png
At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data e.g. wireless gateway serial number and scan start and end timestamps. For an example of a single PR captured, see the ./Single_PR_capture_example.json file.
Environments description
We performed measurements in a controlled rural outdoor environment and in a semi-controlled indoor environment of the Jozef Stefan Institute.
See the Excel spreadsheet Measurement_informations.xlsx for a list of mobile devices tested.
Indoor environment
We used 3 RPi's for the acquisition of PRs in the Jozef Stefan Institute. They were placed indoors in the hallways as shown in the ./Figures/RPi_locations_JSI.png. Measurements were performed on weekend to minimize additional uncontrolled traffic from users' mobile devices. While there is some overlap in WiFi coverage between the devices at the location 2 and 3, the device at location 1 has no overlap with the other two devices.
Rural environment outdoors
The three RPi's used to collect PRs were placed at three different locations with non-overlapping WiFi coverage, as shown in ./Figures/RPi_locations_rural_env.png. Before starting the measurement campaign, all measured devices were turned off and the environment was checked for active WiFi devices. We did not detect any unknown active devices sending WiFi packets in the RPi's coverage area, so the deployment can be considered fully controlled.
All known WiFi enabled devices that were used to collect and send data to the database used a global MAC address, so they can be easily excluded in the preprocessing phase. MAC addresses of these devices can be found in the ./Measurement_informations.xlsx spreadsheet.
Note: The Huawei P20 device with ID 4.3 was not included in the test in this environment.
Scenarios description
We performed three different scenarios of measurements.
Individual device measurements
For each device, we collected PRs for one minute with the screen on, followed by PRs collected for one minute with the screen off. In the indoor environment the WiFi interfaces of the other devices not being tested were disabled. In rural environment other devices were turned off. Start and end timestamps of the recorded data for each device can be found in the ./Measurement_informations.xlsx spreadsheet under the Indoor environment of Jozef Stefan Institute sheet and the Rural environment sheet.
Three groups test
In this measurement scenario, the devices were divided into three groups. The first group contained devices from different manufacturers. The second group contained devices from only one manufacturer (Samsung). Half of the third group consisted of devices from the same manufacturer (Huawei), and the other half of devices from different manufacturers. The distribution of devices among the groups can be found in the ./Measurement_informations.xlsx spreadsheet.
The same data collection procedure was used for all three groups. Data for each group were collected in both environments at three different RPis locations, as shown in ./Figures/RPi_locations_JSI.png and ./Figures/RPi_locations_rural_env.png.
At each location, PRs were collected from each group for 10 minutes with the screen on. Then all three groups switched locations and the process was repeated. Thus, the dataset contains measurements from all three RPi locations of all three groups of devices in both measurement environments. The group movements and the timestamps for the start and end of the collection of PRs at each loacation can be found in spreadsheet ./Measurement_informations.xlsx.
One group test
In the last measurement scenario, all devices were grouped together. In rural evironement we first collected PRs for 10 minutes while the screen was on, and then for another 10 minutes while the screen was off. In indoor environment data were collected at first location with screens on for 10 minutes. Then all devices were moved to the location of the next RPi and PRs were collected for 5 minutes with the screen on and then for another 5 minutes with the screen off.
Folder structure
The root directory contains two files in JSON format for each of the environments where the measurements took place (Data_indoor_environment.json and Data_rural_environment.json). Both files contain collected PRs for the entire day that the measurements were taken (12:00 AM to 12:00 PM) to get a sense of the behaviour of the unknown devices in each environment. The spreadsheet ./Measurement_informations.xlsx. contains three sheets. Devices description contains general information about the tested devices, RPis, and the assigned group for each device. The sheets Indoor environment of Jozef Stefan Institute and Rural environment contain the corresponding timestamps for the start and end of each measurement scenario. For the scenario where the devices were divided into groups, additional information about the movements between locations is included. The location names are based on the RPi gateway ID and may differ from those on the figures showing the
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.
This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.
It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.
Related dataset
Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.
Measurement setup
The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device).
Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.
The following information about each received PR is collected:
- MAC address
- Supported data rates
- extended supported rates
- HT capabilities
- extended capabilities
- data under extended tag and vendor specific tag
- interworking
- VHT capabilities
- RSSI
- SSID
- timestamp when PR was received.
The collected data was forwarded to a remote database via a secure VPN connection.
A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.
Data preprocessing
The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database.
For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:
PR_IE_data =
{
'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext},
'HT_CAP': DATA_htcap,
'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap},
'VHT_CAP': DATA_vhtcap,
'INTERWORKING': DATA_inter,
'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...},
'VENDOR_SPEC': {VENDOR_1:{
'ID_1': DATA_1_vendor1,
'ID_2': DATA_2_vendor1
...},
VENDOR_2:{
'ID_1': DATA_1_vendor2,
'ID_2': DATA_2_vendor2
...}
...}
}
Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.
When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:
{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },
where PR_data is structured as follows:
{
'TIME': [ DATA_time ],
'RSSI': [ DATA_rssi ],
'DATA': PR_IE_data
}.
This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored.
The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval.
If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended.
If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key.
The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png
At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.
Folder structure
For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period.
Each folder contains four files, each containing samples from that device.
The folders are named after the start and end time (in UTC).
For example, the folder [2022-09-22T22-00-00_2022-09-23T22-00-00](2022-09-22T22-00-00_2022-09-23T22-00-00) contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.
Files representing their location via mapping:
- 1.json -> location 1
- 2.json -> location 2
- 3.json -> location 3
- 4.json -> location 4
Environments description
The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo
The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset.
As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system.
Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.
Four Raspbery Pi-s were used:
- location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano)
- location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo
- location 3 -> nothernmost window in the building of Via Etnea near Piazza Università
- location 4 -> first window top the right of the entrance of the University of Catania
Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access)
Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.
Known dataset shortcomings
Due to technical and physical limitations, the dataset contains some identified deficiencies.
PRs are collected and transmitted in 10-second chunks.
Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.
Every 20 minutes the service is restarted on the recording device.
This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond.
For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.
The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.
Location 1 - Piazza del Duomo - Chierici
The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period.
Its location is constant and is not disturbed, dataset seems to have complete coverage.
Location 2 - Via Etnea - Piazza del Duomo
The device is located inside the building.
During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed.
As the device was moved back and forth, power outages and internet connection issues occurred.
The last three days in the record contain no PRs from this location.
Location 3 - Via Etnea - Piazza Università
Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building.
Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present.
This device appears to have been collecting data throughout the whole dataset period.
Location 4 - Piazza Università
This location is wirelessly connected to the access point.
The device was placed statically on a windowsill overlooking the square.
Due to physical limitations, the device had lost power several times during the deployment.
The internet connection was also interrupted sporadically.
Recognitions
The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.