95 datasets found

Deaths registered weekly in England and Wales, provisional
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Nov 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). Deaths registered weekly in England and Wales, provisional [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/weeklyprovisionalfiguresondeathsregisteredinenglandandwales
Explore at:
xlsxAvailable download formats
Dataset updated
Nov 26, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Provisional counts of the number of deaths registered in England and Wales, by age, sex, region and Index of Multiple Deprivation (IMD), in the latest weeks for which data are available.
Deaths, by month
www150.statcan.gc.ca
gimi9.com
+2more
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Deaths, by month [Dataset]. http://doi.org/10.25318/1310070801-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310070801-eng
Dataset updated
Feb 19, 2025
Dataset provided by
Government of Canadahttp://www.gg.ca/
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number and percentage of deaths, by month and place of residence, 1991 to most recent year.
#IndiaNeedsOxygen Tweets
kaggle.com
zip
Updated Nov 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kash (2021). #IndiaNeedsOxygen Tweets [Dataset]. https://www.kaggle.com/kaushiksuresh147/indianeedsoxygen-tweets
Explore at:
zip(4441094 bytes)Available download formats
Dataset updated
Nov 14, 2021
Authors
Kash
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
India marks one COVID-19 death every 5 minutes

https://ichef.bbci.co.uk/news/976/cpsprodpb/11C98/production/_118165827_gettyimages-1232465340.jpg" alt="">

Content

People across India scrambled for life-saving oxygen supplies on Friday and patients lay dying outside hospitals as the capital recorded the equivalent of one death from COVID-19 every five minutes.

For the second day running, the country’s overnight infection total was higher than ever recorded anywhere in the world since the pandemic began last year, at 332,730.

India’s second wave has hit with such ferocity that hospitals are running out of oxygen, beds, and anti-viral drugs. Many patients have been turned away because there was no space for them, doctors in Delhi said.

https://s.yimg.com/ny/api/res/1.2/XhVWo4SOloJoXaQLrxxUIQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MA--/https://s.yimg.com/os/creatr-uploaded-images/2021-04/8aa568f0-a3e0-11eb-8ff6-6b9a188e374a" alt="">

Mass cremations have been taking place as the crematoriums have run out of space. Ambulance sirens sounded throughout the day in the deserted streets of the capital, one of India’s worst-hit cities, where a lockdown is in place to try and stem the transmission of the virus. source

Dataset

The dataset consists of the tweets made with the #IndiaWantsOxygen hashtag covering the tweets from the past week. The dataset totally consists of 25,440 tweets and will be updated on a daily basis.

The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #IndiaWantsOxygen | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |

Acknowledgements

https://globalnews.ca/news/7785122/india-covid-19-hospitals-record/ Image courtesy: BBC and Reuters

Inspiration

The past few days have been really depressing after seeing these incidents. These tweets are the voice of the indians requesting help and people all over the globe asking their own countries to support India by providing oxygen tanks.

And I strongly believe that this is not just some data, but the pure emotions of people and their call for help. And I hope we as data scientists could contribute on this front by providing valuable information and insights.
Research Data (Health & Environmental).xlsx
figshare.com
xlsx
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anik Chakraborty (2023). Research Data (Health & Environmental).xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.23498982.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23498982.v1
Dataset updated
Jun 11, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Anik Chakraborty
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Title: Dataset for IoT-Based Remote Health Monitoring System for Asthma Patients

Description: The sensor data from an Internet of Things-based remote health monitoring system created for people with asthma is included in this collection. The information includes both environmental and health factors, giving details on the patients' health and indoor environments. The dataset is a useful tool for researching the efficiency of the monitoring system and examining asthma patients' medical problems.

Environmental Data: The file contains measurements of the room's temperature, humidity, and dust density. A DHT11 sensor was used to measure temperature and humidity, while an optical dust sensor was used to measure dust concentration. Over the course of a month, the data was gathered every 15 minutes.

Health Data: The dataset also includes health parameters including body temperature, oxygen saturation, and heart rate (measured in beats per minute, or BPM). A MAX30100 sensor was used to measure the heart rate and the amount of oxygen in the blood, while a DS18B20 sensor was used to gauge the body temperature. Over the course of one week, measurements for each of these parameters were conducted at regular intervals on one patient.

Data Analysis: A thorough data analysis, including descriptive analysis, graphical representation, and statistical testing, has been performed on the dataset. For both environmental and health factors, descriptive analysis involves computing a number of statistical measures, including mean, standard deviation, minimum, maximum, and percentage of data outside permissible limits. To see the trends and patterns in the data, graphs were created, including line plots, box plots, and time series plots. In order to investigate the variations and importance of health markers throughout various time periods, statistical tests like ANOVA were carried out.

Data Reproducibility: Researchers may use the technique described in the corresponding paper to replicate the data. This involves connecting the sensors (DHT11, MAX30100, DS18B20) for data collecting, setting up the IoT-based monitoring system using NodeMCU and Arduino microcontrollers, and employing the relevant software platforms (Blynk, Thingspeak) for real-time monitoring and data visualization. The corresponding author will provide more thorough instructions upon request.
PMData
kaggle.com
huggingface.co
zip
Updated Apr 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VT (2021). PMData [Dataset]. https://www.kaggle.com/datasets/vlbthambawita/pmdata-a-sports-logging-dataset/discussion
Explore at:
zip(1401630710 bytes)Available download formats
Dataset updated
Apr 19, 2021
Authors
VT
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Paper: https://dl.acm.org/doi/10.1145/3339825.3394926

In this dataset, we present the PMData dataset that aims to combine traditional lifelogging with sports activity logging. Such a dataset enables the development of several interesting analysis applications, e.g., where additional sports data can be used to predict and analyze everyday developments like a person's weight and sleep patterns, and where traditional lifelog data can be used in a sports context to predict an athletes performance. In this respect, we have used the Fitbit Versa 2 smartwatch wristband, the PMSys sports logging app a and Google forms for the data collection, and PMData contains logging data for 5 months from 16 persons. Our initial experiments show that such analyzes are possible, but there are still large rooms for improvements.

Dataset details

The structure of the main folder:

The structure of the main folder:

[Main folder]

p01

p02

...

p16

participant-overview.xlsx

The structure of each sub folder (pXX):

pXX [folder]: is a folder containing data of participant XX (notation XX represents the identifier of the participant).

fitbit [folder]

calories.json: shows how many calories the person have burned the last minute.

distance.json: gives the distance moved per minute. Distance seems to be in centimeters.

exercise.json: describes each activity in more detail. It contains the date with start and stop time, time in different activity levels, type of activity and various performance metrics depending a bit on type of exercise, e.g., for running, it contains distance, time, steps, calories, speed and pace.

heart_rate.json: shows the number of heart beats per minute (bpm) at a given time.

lightly_active_minutes.json: sums up the number of lightly active minutes per day.

moderately_active_minutes.json: sums up the number of moderately active minutes per day.

resting_heart_rate.json: gives the resting heart rate per day.

sedentary_minutes.json: sums up the number of sedentary minutes per day.

sleep_score.csv: helps understand the sleep each night so you can see trends in the sleep patterns. It contains an overall 0-100 score made up from composition, revitalization and duration scores, the number of deep sleep minutes, the resting heart rate and a restlessness score.

sleep.json: is a per sleep breakdown of the sleep into periods of light, deep, rem sleeps and time awake.

steps.json: displays the number of steps per minute.

time_in_heart_rate_zones.json: gives the number of minutes in different heart rate zoned. Using the common formula of 220 minus your age, Fitbit will calculate your maximum heart rate and then create three target heart rate zones fat burn (50 to 69 percent of your max heart rate), cardio (70 to 84 percent of your max heart rate), and peak (85 to 100 percent of your max heart rate) - based off that number.

very_active_minutes.json: sums up the number of very active minutes per day.

googledocs [folder]

reporting.csv: contains one line per report including the date reported for, a timestamp of the report submission time, the eaten meals (breakfast, lunch, dinner and evening meal), the participants weight this day, the number of glasses drunk, and whether one has consumed alcohol.

pmsys [folder]

injury.csv: shows injuries with a time and date and corresponding injury locations and a minor and major severity.

srpe.csv: contains a training session’s end-time, type of activity, the perceived exertion (RPE), and the duration in the number of minutes. This is, for example, used to calculate the sessions training load or sRPE (RPE×duration).

wellness.csv: includes parameters like time and date, fatigue, mood, readiness, sleep duration (number of hours), sleep quality, soreness (and soreness area), and stress. Fatigue, sleep qual-ity, soreness, stress, and mood all have a 1-5 scale. The score 3 is normal, and 1-2 are scores below normal and 4-5 are scores above normal. Sleep length is just a measure of how long the sleep was in hours, and readiness (scale 0-10) is an overall subjective measure of how ready are you to exercise, i.e., 0 means not ready at all and 10 indicates that you cannot feel any better and are ready for anything!

food-images.zip: Participants 1, 3 and 5 have taken pictures of everything they have eaten except water during 2 months (February and March). There are food images included in this .zip file, and information about day and time is given in the...
Data for: World's human migration patterns in 2000-2019 unveiled by...
data.niaid.nih.gov
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niva, Venla; Horton, Alexander; Virkki, Vili; Heino, Matias; Kallio, Marko; Kinnunen, Pekka; Abel, Guy J; Muttarak, Raya; Taka, Maija; Varis, Olli; Kummu, Matti (2024). Data for: World's human migration patterns in 2000-2019 unveiled by high-resolution data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7997133
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Wittgenstein Centre for Demography and Global Human Capitalhttp://www.oeaw.ac.at/wic/
Aalto University
Authors
Niva, Venla; Horton, Alexander; Virkki, Vili; Heino, Matias; Kallio, Marko; Kinnunen, Pekka; Abel, Guy J; Muttarak, Raya; Taka, Maija; Varis, Olli; Kummu, Matti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
This dataset provides a global gridded (5 arc-min resolution) detailed annual net-migration dataset for 2000-2019. We also provide global annual birth and death rate datasets – that were used to estimate the net-migration – for same years. The dataset is presented in details, with some further analyses, in the following publication. Please cite this paper when using data.

Niva et al. 2023. World's human migration patterns in 2000-2019 unveiled by high-resolution data. Nature Human Behaviour 7: 2023–2037. Doi: https://doi.org/10.1038/s41562-023-01689-4

You can explore the data in our online net-migration explorer: https://wdrg.aalto.fi/global-net-migration-explorer/

Short introduction to the data

For the dataset, we collected, gap-filled, and harmonised:

a comprehensive national level birth and death rate datasets for altogether 216 countries or sovereign states; and

sub-national data for births (data covering 163 countries, divided altogether into 2555 admin units) and deaths (123 countries, 2067 admin units).

These birth and death rates were downscaled with selected socio-economic indicators to 5 arc-min grid for each year 2000-2019. These allowed us to calculate the 'natural' population change and when this was compared with the reported changes in population, we were able to estimate the annual net-migration. See more about the methods and calculations at Niva et al (2023).

We recommend using the data either over multiple years (we provide 3, 5 and 20 year net-migration sums at gridded level) or then aggregated over larger area (we provide adm0, adm1 and adm2 level geospatial polygon files). This is due to some noise in the gridded annual data.

Due to copy-right issues we are not able to release all the original data collected, but those can be requested from the authors.

List of datasets

Birth and death rates:

raster_birth_rate_2000_2019.tif: Gridded birth rate for 2000-2019 (5 arc-min; multiband tif)

raster_death_rate_2000_2019.tif: Gridded death rate for 2000-2019 (5 arc-min; multiband tif)

tabulated_adm1adm0_birth_rate.csv: Tabulated sub-national birth rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)

tabulated_ adm1adm0_death_rate.csv: Tabulated sub-national death rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)

Net-migration:

raster_netMgr_2000_2019_annual.tif: Gridded annual net-migration 2000-2019 (5 arc-min; multiband tif)

raster_netMgr_2000_2019_3yrSum.tif: Gridded 3-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)

raster_netMgr_2000_2019_5yrSum.tif: Gridded 5-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)

raster_netMgr_2000_2019_20yrSum.tif: Gridded 20-yr sum net-migration 2000-2019 (5 arc-min)

polyg_adm0_dataNetMgr.gpkg: National (adm 0 level) net-migration geospatial file (gpkg)

polyg_adm1_dataNetMgr.gpkg: Provincial (adm 1 level) net-migration geospatial file (gpkg) (if not adm 1 level division, adm 0 used)

polyg_adm2_dataNetMgr.gpkg: Communal (adm 2 level) net-migration geospatial file (gpkg) (if not adm 2 level division, adm 1 used; and if not adm 1 level division either, adm 0 used)

Files to run online net migration explorer

masterData.rds and admGeoms.rds are related to our online ‘Net-migration explorer’ tool (https://wdrg.aalto.fi/global-net-migration-explorer/). The source code of this application is available in https://github.com/vvirkki/net-migration-explorer. Running the application locally requires these two .rds files from this repository.

Metadata

Grids:

Resolution: 5 arc-min (0.083333333 degrees)

Spatial extent: Lon: -180, 180; -90, 90 (xmin, xmax, ymin, ymax)

Coordinate ref system: EPSG:4326 - WGS 84

Format: Multiband geotiff; each band for each year over 2000-2019

Units:

Birth and death rates: births/deaths per 1000 people per year

Net-migration: persons per 1000 people per time period (year, 3yr, 5yr, 20yr, depending on the dataset)

Geospatial polygon (gpkg) files:

Spatial extent: -180, 180; -90, 83.67 (xmin, xmax, ymin, ymax)

Temporal extent: annual over 2000-2019

Coordinate ref system: EPSG:4326 - WGS 84

Format: gkpk

Units:

Net-migration: persons per 1000 people per year
r
Pedestrian Counting System - Past Hour (counts per minute)
researchdata.edu.au
data.melbourne.vic.gov.au
+1more
Updated Mar 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Melbourne (2023). Pedestrian Counting System - Past Hour (counts per minute) [Dataset]. https://researchdata.edu.au/pedestrian-counting-system-counts-minute/3896337
Explore at:
Dataset updated
Mar 7, 2023
Dataset provided by
data.vic.gov.au
Authors
City of Melbourne
Description
Current issue 23/09/2020

Please note: Sensors 67, 68 and 69 are showing duplicate records. We are currently working on a fix to resolve this.

This dataset contains minute by minute directional pedestrian counts for the last hour from pedestrian sensor devices located across the city. The data is updated every 15 minutes and can be used to determine variations in pedestrian activity throughout the day.

The sensor_id column can be used to merge the data with the Sensor Locations dataset which details the location, status and directional readings of sensors. Any changes to sensor locations are important to consider when analysing and interpreting historical pedestrian counting data.

Note this dataset may not contain a reading for every sensor for every minute as sensor devices only create a record when one or more pedestrians have passed underneath the sensor.

The Pedestrian Counting System helps us to understand how people use different city locations at different times of day to better inform decision-making and plan for the future. A representation of pedestrian volume which compares each location on any given day and time can be found in our Online Visualisation.

Related datasets:
Pedestrian Counting System – 2009 to Present (counts per hour).
Pedestrian Counting System - Sensor Locations
c
do you have a minute to get onchain? Price Prediction Data
coinbase.com
Updated Dec 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). do you have a minute to get onchain? Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/base-do-you-have-a-minute-to-get-onchain-b668
Explore at:
Dataset updated
Dec 2, 2025
Variables measured
Growth Rate, Predicted Price
Measurement technique
User-defined projections based on compound growth. This is not a formal financial forecast.
Description
This dataset contains the predicted prices of the asset do you have a minute to get onchain? over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
o
Pedestrian Counting System (counts per hour)
melbournetestbed.opendatasoft.com
researchdata.edu.au
+1more
csv, excel, geojson +1
Updated Aug 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Pedestrian Counting System (counts per hour) [Dataset]. https://melbournetestbed.opendatasoft.com/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/api/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Aug 14, 2024
Description
This dataset contains hourly pedestrian counts since 2009 from pedestrian sensor devices located across the city. The data is updated on a monthly basis and can be used to determine variations in pedestrian activity throughout the day.The sensor_id column can be used to merge the data with the Pedestrian Counting System - Sensor Locations dataset which details the location, status and directional readings of sensors. Any changes to sensor locations are important to consider when analysing and interpreting pedestrian counts over time.Importants notes about this dataset:• Where no pedestrians have passed underneath a sensor during an hour, a count of zero will be shown for the sensor for that hour.• Directional readings are not included, though we hope to make this available later in the year. Directional readings are provided in the Pedestrian Counting System – Past Hour (counts per minute) dataset.The Pedestrian Counting System helps to understand how people use different city locations at different times of day to better inform decision-making and plan for the future. A representation of pedestrian volume which compares each location on any given day and time can be found in our Online Visualisation.Related datasets:Pedestrian Counting System – Past Hour (counts per minute)Pedestrian Counting System - Sensor Locations
e
SL Närliggande hållplatser 2
data.europa.eu
json, xml
Updated Sep 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SL (2024). SL Närliggande hållplatser 2 [Dataset]. https://data.europa.eu/data/datasets/https-www-trafiklab-se-sites-default-files-dcat-dataset-25595-rdf
Explore at:
json, xmlAvailable download formats
Dataset updated
Sep 15, 2024
Dataset authored and provided by
SL
License
https://www.trafiklab.se/api/sl-narliggande-hallplatser-2/licenshttps://www.trafiklab.se/api/sl-narliggande-hallplatser-2/licens
Description
With this API you can get information about nearby stops to a provided location based on latitude and longitude.

API key

A valid API key is required that is sent as the "key" parameter in all method calls. An API key is obtained by creating a project that uses this API. More about how to create and use API keys can be found here. To get an API key, you must accept the API's license terms. There are some restrictions on the number of calls per minute and per month.

Note: To get the Gold level you need to write a detailed description of the project at Trafiklab. Projects that do not have a description will be denied the Gold level.

Level Max call/month Max call/minute

Bronze 10,000 30

Silver 500 000 60

Gold As needed

Our goal is that everyone who wants data should have access to it, and that it should remain free of charge. The limitations we have today are because we want to be able to guarantee that our technology works for as many people as possible. Before upgrading a service, we want a forecast of how much traffic the service will generate in relation to how many travelers will benefit from the information. To have your account upgraded, use the upgrade button next to your key.
Z
Social networks predict the life and death of honey bees - Data
data.niaid.nih.gov
zenodo.org
Updated Jan 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wild, Benjamin; Dormagen, David; Landgraf, Tim (2021). Social networks predict the life and death of honey bees - Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4438012
Explore at:
Dataset updated
Jan 15, 2021
Dataset provided by
Freie Universität Berlin
Authors
Wild, Benjamin; Dormagen, David; Landgraf, Tim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Interaction matrices and metadata used in "Social networks predict the life and death of honey bees"

Preprint: Social networks predict the life and death of honey bees

See the README file in bb_network_decomposition for example code.

The following files are included:

interaction_networks_20160729to20160827.h5

The social interaction networks as a dense tensor and metadata.

Keys:

interactions: Tensor of shape (29, 2010, 2010, 9) (days x individuals x individuals x interaction_types). I_{d,i,j,t} = log(1 + x), where x is the number of interactions of type t between individuals i and j at recording day d. See the methods section of paper of the interaction types.

labels: Names of the 9 interaction types in the order they are stored in the interactions tensor.

bee_ids: List of length 2010, mapping from sequential index used in the interaction tensor to the original BeesBook tag ID of the individual

alive_bees_bayesian.csv

This file contains the results of the bayesian lifetime model with one row for each bee.

Columns:

bee_id: Numerical unique identifier for each individual.

days_alive: Number of bees the bees was determined to be alive. If the individual was still alive at the end of the recording, the number of days from the day she hatched until the end of the recording.

death_observed: Boolean indicator whether the death occurred during the recording period.

annotated_tagged_date: Hatch date of the individual, i.e. the date she was tagged.

inferred_death_date: The death date as determined by the model.

bee_daily_data.csv

This file contains one row per bee per day that she was alive for the focal period.

Columns:

bee_id: Numerical unique identifier for each individual.

date: Date in year-month-day format.

age: Age in days. Can be NaN if the bee has no associated death_date.

network_age, network_age_1, network_age_2: The first three dimensions of network age.

dance_floor, honey_storage, near_exit, brood_area_total: Normalized (sum to 1). Can be NaN if a bee had no high confidence detections (>0.9) for a given day. Can be 0 if a bee was only seen outside of the annotated areas.

location_descriptor_count: The number of minutes the bee was seen in one of the location labels during that day. I.e., dance_floor * location_descriptor_count calculates the number of minutes, the bee was seen on the dance floor on the given day.

death_date: Date the bee was last seen in the colony in year-month-day format. Can be NaN for individuals that did not die until the end of the recording period.

circadian_rhythm: R² value of a sine with a period of one day fitted to the velocity data of the individual over three days. Can be NaN if the fit did not converge due to a lack of data points.

velocity_peak_time: Phase of the circadian sine fit in hours as an offset to 12:00 UTC. Can be NaN if circadian_rhythm is NaN.

velocity_day, velocity_night: Mean velocity of the individual between 09:00-18:00 UTC and 21:00-06:00 UTC, respectively. Can be NaN if no velocity data was available for that interval.

days_left: Difference in days between date and death_date. Can be NaN if death_date is NaN.

location_data.csv

This file contains subsampled position information for all bees during the focal period. The data contains one row for every individual for every minute of the recording if that individual was seen at least once during that minute with a tag confidence of at least 0.9. The first matching detection for each individual is used.

Columns:

In addition to the bee_id and date columns as in the bee_daily_data.csv, the file contains these additional columns:

cam_id, cams: The cam_id is a numerical identifier from {0, 1, 2, 3}. Each side of the hive is filmed by two cameras where {0, 1} and {2, 3} record the same side respectively. The cams column contains values either “(0, 1)” or “(2, 3)” and indicates to which sides of the hive this detection belongs.

x_pos_hive, y_pos_hive: The spatial positions in millimeters on the hive. The two cameras from one side share a common coordinate system.

location: The label that was assigned to the comb at (x_pos_hive, y_pos_hive) on the given date. The label “other” indicates detections that were outside of any annotated region. The label “not_comb” indicates the wooden frame or empty space around the comb.

timestamp, date: The timestamp indicates the beginning of each one-minute sampling interval and is given in UTC, as indicated (example: “2016-08-13 00:00:00+00:00”). The date part of the timestamp is repeated in the “date” column. Both are given in year-month-day format.

Software used to acquire and analyze the data:

bb_network_decomposition: Network age calculation and regression analyses

bb_pipeline: Tag localization and decoding pipeline

bb_pipeline_models: Pretrained localizer and decoder models for bb_pipeline

bb_binary: Raw detection data storage format

bb_irflash: IR flash system schematics and arduino code

bb_imgacquisition: Recording and network storage

bb_behavior: Database interaction and data (pre)processing, velocity calculation

bb_circadian: Circadian rhythm calculations

bb_tracking: Tracking of bee detections over time

bb_wdd: Automatic detection and decoding of honey bee waggle dances

bb_interval_determination: Homography calculation

bb_stitcher: Image stitching
Z
Data from: HRV-ACC: a dataset with R-R intervals and accelerometer data for...
data.niaid.nih.gov
zenodo.org
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kamil Książek; Wilhelm Masarczyk; Przemysław Głomb; Michał Romaszewski; Iga Stokłosa; Piotr Ścisło; Paweł Dębski; Robert Pudlo; Piotr Gorczyca; Magdalena Piegza (2023). HRV-ACC: a dataset with R-R intervals and accelerometer data for the diagnosis of psychotic disorders using a Polar H10 wearable sensor [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8171265
Explore at:
Dataset updated
Aug 9, 2023
Dataset provided by
Institute of Theoretical and Applied Informatics, Polish Academy of Sciences
Institute of Psychology, Humanitas University in Sosnowiec
Department of Psychoprophylaxis, Faculty of Medical Sciences in Zabrze, Medical University of Silesia
Psychiatric Department of the Multidisciplinary Hospital in Tarnowskie Góry
Department of Psychiatry, Faculty of Medical Sciences in Zabrze, Medical University of Silesia
Authors
Kamil Książek; Wilhelm Masarczyk; Przemysław Głomb; Michał Romaszewski; Iga Stokłosa; Piotr Ścisło; Paweł Dębski; Robert Pudlo; Piotr Gorczyca; Magdalena Piegza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT

The issue of diagnosing psychotic diseases, including schizophrenia and bipolar disorder, in particular, the objectification of symptom severity assessment, is still a problem requiring the attention of researchers. Two measures that can be helpful in patient diagnosis are heart rate variability calculated based on electrocardiographic signal and accelerometer mobility data. The following dataset contains data from 30 psychiatric ward patients having schizophrenia or bipolar disorder and 30 healthy persons. The duration of the measurements for individuals was usually between 1.5 and 2 hours. R-R intervals necessary for heart rate variability calculation were collected simultaneously with accelerometer data using a wearable Polar H10 device. The Positive and Negative Syndrome Scale (PANSS) test was performed for each patient participating in the experiment, and its results were attached to the dataset. Furthermore, the code for loading and preprocessing data, as well as for statistical analysis, was included on the corresponding GitHub repository.

BACKGROUND

Heart rate variability (HRV), calculated based on electrocardiographic (ECG) recordings of R-R intervals stemming from the heart's electrical activity, may be used as a biomarker of mental illnesses, including schizophrenia and bipolar disorder (BD) [Benjamin et al]. The variations of R-R interval values correspond to the heart's autonomic regulation changes [Berntson et al, Stogios et al]. Moreover, the HRV measure reflects the activity of the sympathetic and parasympathetic parts of the autonomous nervous system (ANS) [Task Force of the European Society of Cardiology the North American Society of Pacing Electrophysiology, Matusik et al]. Patients with psychotic mental disorders show a tendency for a change in the centrally regulated ANS balance in the direction of less dynamic changes in the ANS activity in response to different environmental conditions [Stogios et al]. Larger sympathetic activity relative to the parasympathetic one leads to lower HRV, while, on the other hand, higher parasympathetic activity translates to higher HRV. This loss of dynamic response may be an indicator of mental health. Additional benefits may come from measuring the daily activity of patients using accelerometry. This may be used to register periods of physical activity and inactivity or withdrawal for further correlation with HRV values recorded at the same time.

EXPERIMENTS

In our experiment, the participants were 30 psychiatric ward patients with schizophrenia or BD and 30 healthy people. All measurements were performed using a Polar H10 wearable device. The sensor collects ECG recordings and accelerometer data and, additionally, prepares a detection of R wave peaks. Participants of the experiment had to wear the sensor for a given time. Basically, it was between 1.5 and 2 hours, but the shortest recording was 70 minutes. During this time, evaluated persons could perform any activity a few minutes after starting the measurement. Participants were encouraged to undertake physical activity and, more specifically, to take a walk. Due to patients being in the medical ward, they received instruction to take a walk in the corridors at the beginning of the experiment. They were to repeat the walk 30 minutes and 1 hour after the first walk. The subsequent walks were to be slightly longer (about 3, 5 and 7 minutes, respectively). We did not remind or supervise the command during the experiment, both in the treatment and the control group. Seven persons from the control group did not receive this order and their measurements correspond to freely selected activities with rest periods but at least three of them performed physical activities during this time. Nevertheless, at the start of the experiment, all participants were requested to rest in a sitting position for 5 minutes. Moreover, for each patient, the disease severity was assessed using the PANSS test and its scores are attached to the dataset.

The data from sensors were collected using Polar Sensor Logger application [Happonen]. Such extracted measurements were then preprocessed and analyzed using the code prepared by the authors of the experiment. It is publicly available on the GitHub repository [Książek et al].

Firstly, we performed a manual artifact detection to remove abnormal heartbeats due to non-sinus beats and technical issues of the device (e.g. temporary disconnections and inappropriate electrode readings). We also performed anomaly detection using Daubechies wavelet transform. Nevertheless, the dataset includes raw data, while a full code necessary to reproduce our anomaly detection approach is available in the repository. Optionally, it is also possible to perform cubic spline data interpolation. After that step, rolling windows of a particular size and time intervals between them are created. Then, a statistical analysis is prepared, e.g. mean HRV calculation using the RMSSD (Root Mean Square of Successive Differences) approach, measuring a relationship between mean HRV and PANSS scores, mobility coefficient calculation based on accelerometer data and verification of dependencies between HRV and mobility scores.

DATA DESCRIPTION

The structure of the dataset is as follows. One folder, called HRV_anonymized_data contains values of R-R intervals together with timestamps for each experiment participant. The data was properly anonymized, i.e. the day of the measurement was removed to prevent person identification. Files concerned with patients have the name treatment_X.csv, where X is the number of the person, while files related to the healthy controls are named control_Y.csv, where Y is the identification number of the person. Furthermore, for visualization purposes, an image of the raw RR intervals for each participant is presented. Its name is raw_RR_{control,treatment}_N.png, where N is the number of the person from the control/treatment group. The collected data are raw, i.e. before the anomaly removal. The code enabling reproducing the anomaly detection stage and removing suspicious heartbeats is publicly available in the repository [Książek et al]. The structure of consecutive files collecting R-R intervals is following:

Phone timestamp RR-interval [ms] 12:43:26.538000 651 12:43:27.189000 632 12:43:27.821000 618 12:43:28.439000 621 12:43:29.060000 661 ... ...

The first column contains the timestamp for which the distance between two consecutive R peaks was registered. The corresponding R-R interval is presented in the second column of the file and is expressed in milliseconds.
The second folder, called accelerometer_anonymized_data contains values of accelerometer data collected at the same time as R-R intervals. The naming convention is similar to that of the R-R interval data: treatment_X.csv and control_X.csv represent the data coming from the persons from the treatment and control group, respectively, while X is the identification number of the selected participant. The numbers are exactly the same as for R-R intervals. The structure of the files with accelerometer recordings is as follows:

Phone timestamp X [mg] Y [mg] Z [mg] 13:00:17.196000 -961 -23 182 13:00:17.205000 -965 -21 181 13:00:17.215000 -966 -22 187 13:00:17.225000 -967 -26 193 13:00:17.235000 -965 -27 191 ... ... ... ...

The first column contains a timestamp, while the next three columns correspond to the currently registered acceleration in three axes: X, Y and Z, in milli-g unit.

We also attached a file with the PANSS test scores (PANSS.csv) for all patients participating in the measurement. The structure of this file is as follows:

no_of_person PANSS_P PANSS_N PANSS_G PANSS_total 1 8 13 22 43 2 11 7 18 36 3 14 30 44 88 4 18 13 27 58 ... ... ... ... ..

The first column contains the identification number of the patient, while the three following columns refer to the PANSS scores related to positive, negative and general symptoms, respectively.

USAGE NOTES

All the files necessary to run the HRV and/or accelerometer data analysis are available on the GitHub repository [Książek et al]. HRV data loading, preprocessing (i.e. anomaly detection and removal), as well as the calculation of mean HRV values in terms of the RMSSD, is performed in the main.py file. Also, Pearson's correlation coefficients between HRV values and PANSS scores and the statistical tests (Levene's and Mann-Whitney U tests) comparing the treatment and control groups are computed. By default, a sensitivity analysis is made, i.e. running the full pipeline for different settings of the window size for which the HRV is calculated and various time intervals between consecutive windows. Preparing the heatmaps of correlation coefficients and corresponding p-values can be done by running the utils_advanced_plots.py file after performing the sensitivity analysis. Furthermore, a detailed analysis for the one selected set of hyperparameters may be prepared (by setting sensitivity_analysis = False), i.e. for 15-minute window sizes, 1-minute time intervals between consecutive windows and without data interpolation method. Also, patients taking quetiapine may be excluded from further calculations by setting exclude_quetiapine = True because this medicine can have a strong impact on HRV [Hattori et al].

The accelerometer data processing may be performed using the utils_accelerometer.py file. In this case, accelerometer recordings are downsampled to ensure the same timestamps as for R-R intervals and, for each participant, the mobility coefficient is calculated. Then, a correlation
b
Percentage within a 15-minute walk of their nearest library - WMCA
cityobservatory.birmingham.gov.uk
csv, excel, geojson +1
Updated Dec 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Percentage within a 15-minute walk of their nearest library - WMCA [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-within-a-15-minute-walk-of-their-nearest-library-wmca/
Explore at:
excel, geojson, json, csvAvailable download formats
Dataset updated
Dec 3, 2025
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This is the percentage of people living less than a 15-minute walk of their nearest library. Travel time to libraries analysis was performed in collaboration with the Ordnance Survey (OS).

Distance and travel time analysis requires identification of origins and destinations, and a routable network from origin to destination.

Destinations: Data about libraries are from the Sites Dataset, within the Land Use theme of the OS NGD API Features database. This has been supplemented using data from:

Arts Council for England Basic Dataset for Libraries CILIP Cymru Wales Public Libraries by Operator Dataset

This analysis excludes prison, archive, and mobile libraries, as well as static libraries with less than 2 opening hours per week from the Arts Council for England dataset.

The analysis contains the locations of static public libraries, accurate to May of the year of publication. A walking speed of 4.8 kilometres per hour has been used.

For routing analysis, a buffer from a destination is created using the OS Multi-modal Routable Network (MRN). This indicates which 100m grid squares lie within a one hour’s walk. The network generated by MRN is then used to calculate the travel time between each 100m grid and destination. If a grid square is within a catchment of multiple destinations, the shortest time is chosen.

Population estimates are calculated using Output Area (OA) population data. By aggregating distances and travel times for entire OAs, the estimates aren't exact proportions of the larger area's population. This approach might result in identical proportions across different time bands and variables because the same OAs are included or excluded.

Data is Powered by LG Inform Plus and automatically checked for new data on the 3rd of each month.
Audio Commons Ground Truth Data for deliverables D4.4, D4.10 and D4.12
data.europa.eu
unknown
Updated Jan 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2020). Audio Commons Ground Truth Data for deliverables D4.4, D4.10 and D4.12 [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-2546754?locale=hr
Explore at:
unknown(3086)Available download formats
Dataset updated
Jan 23, 2020
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the ground truth data used to evaluate the musical pitch, tempo and key estimation algorithms developed during the AudioCommons H2020 EU project and which are part of the Audio Commons Audio Extractor tool. It also includes ground truth information for the single-eventness audio descriptor also developed for the same tool. This ground truth data has been used to generate the following documents: Deliverable D4.4: Evaluation report on the first prototype tool for the automatic semantic description of music samples Deliverable D4.10: Evaluation report on the second prototype tool for the automatic semantic description of music samples Deliverable D4.12: Release of tool for the automatic semantic description of music samples All these documents are available in the materials section of the AudioCommons website. All ground truth data in this repository is provided in the form of CSV files. Each CSV file corresponds to one of the individual datasets used in one or more evaluation tasks of the aforementioned deliverables. This repository does not include the audio files of each individual dataset, but includes references to the audio files. The following paragraphs describe the structure of the CSV files and give some notes about how to obtain the audio files in case these would be needed. Structure of the CSV files All CSV files in this repository (with the sole exception of SINGLE EVENT - Ground Truth.csv) feature the following 5 columns: Audio reference: reference to the corresponding audio file. This will either be a string withe the filename, or the Freesound ID (for one dataset based on Freesound content). See below for details about how to obtain those files. Audio reference type: will be one of Filename or Freesound ID, and specifies how the previous column should be interpreted. Key annotation: tonality information as a string with the form "RootNote minor/major". Audio files with no ground truth annotation for tonality are left blank. Ground truth annotations are parsed from the original data source as described in the text of deliverables D4.4 and D4.10. Tempo annotation: tempo information as an integer representing beats per minute. Audio files with no ground truth annotation for tempo are left blank. Ground truth annotations are parsed from the original data source as described in the text of deliverables D4.4 and D4.10. Note that integer values are used here because we only have tempo annotations for music loops which typically only feature integer tempo values. Pitch annotation: pitch information as an integer representing the MIDI note number corresponding to annotated pitch's frequency. Audio files with no ground truth pitch for tempo are left blank. Ground truth annotations are parsed from the original data source as described in the text of deliverables D4.4 and D4.10. The remaining CSV file, SINGLE EVENT - Ground Truth.csv, has only the following 2 columns: Freesound ID: sound ID used in Freesound to identify the audio clip. Single Event: boolean indicating whether the corresponding sound is considered to be a single event or not. Single event annotations were collected by the authors of the deliverables as described in deliverable D4.10. How to get the audio data In this section we provide some notes about how to obtain the audio files corresponding to the ground truth annotations provided here. Note that due to licensing restrictions we are not allowed to re-distribute the audio data corresponding to most of these ground truth annotations. Apple Loops (APPL): This dataset includes some of the music loops included in Apple's music software such as Logic or GarageBand. Access to these loops requires owning a license for the software. Detailed instructions about how to set up this dataset are provided here. Carlos Vaquero Instruments Dataset (CVAQ): This dataset includes single instrument recordings carried out by Carlos Vaquero as part of this master thesis. Sounds are available as Freesound packs and can be downloaded at this page: https://freesound.org/people/Carlos_Vaquero/packs Freesound Loops 4k (FSL4): This dataset set includes a selection of music loops taken from Freesound. Detailed instructions about how to set up this dataset are provided here. Giant Steps Key Dataset (GSKY): This dataset includes a selection of previews from Beatport annotated by key. Audio and original annotations available here. Good-sounds Dataset (GSND): This dataset contains monophonic recordings of instrument samples. Full description, original annotations and audio are available here. University of IOWA Musical Instrument Samples (IOWA): This dataset was created by the Electronic Music Studios of the University of IOWA and contains recordings of instrument samples. The dataset is available upon request by visiting this website. Mixcraft Loops (MIXL): This dataset includes some of the music loops included in Acoustica's Mixcraft music software. Access to these loops requires owning
Mobile_usage_dataset_individual_person
kaggle.com
zip
Updated Mar 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
arul08 (2020). Mobile_usage_dataset_individual_person [Dataset]. https://www.kaggle.com/arul08/mobile-usage-dataset-individual-person
Explore at:
zip(617015 bytes)Available download formats
Dataset updated
Mar 8, 2020
Authors
arul08
Description
Do you know?

Do you know how much time you spend on an app? Do you know the total use time of a day or average use time of an app?

What it consists of?

This data set consists of - how many times a person unlocks his phone. - how much time he spends on every app on every day. - how much time he spends on his phone.

It lists the usage time of apps for each day.

What we can do?

Use the test data to find the Total Minutes that we can use the given app in a day. we can get a clear stats of apps usage. This data set will show you about the persons sleeping behavior as well as what app he spends most of his time. with this we can improve the productivity of the person.

The dataset was collected from the app usage app.
e
SL Reseplanerare 3.1
data.europa.eu
json, xml
Updated Oct 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SL (2021). SL Reseplanerare 3.1 [Dataset]. https://data.europa.eu/data/datasets/https-www-trafiklab-se-sites-default-files-dcat-dataset-25593-rdf
Explore at:
json, xmlAvailable download formats
Dataset updated
Oct 11, 2021
Dataset authored and provided by
SL
License
https://www.trafiklab.se/api/sl-reseplanerare-31/licenshttps://www.trafiklab.se/api/sl-reseplanerare-31/licens
Description
With this API you can get travel suggestions from A to B within Stockholm County with SL's traffic.

In SL's journey planner there is also Waxholmsbolaget's traffic. The API can be used to calculate travel suggestions between any combination of location and/or stopping point. The API returns travel suggestions from the ‘best match’ of what is entered.

Changes since SL Journey Planner 3

A change has been made that may affect implementing applications. For version 3.1 of the Journey Planner, the response format for the crd elements of the polyline describing the detailed itinerary has been changed. The dots are represented in this version of double numbers compared to version 3.0 which represented them as integers.

SL Journey Planner 3.0:

17973032593605019 18

SL Journey Planner 3.1:

17.97303259.3605019.0E-61.8E-5

API key

A valid API key is required that is sent as the "key" parameter in all method calls. An API key is obtained by creating a project that uses this API. More about how to create and use API keys can be found here. To get an API key, you must accept the API's license terms. There are some restrictions on the number of calls per minute and per month.

Note: To get the Gold level you need to write a detailed description of the project at Trafiklab. Projects that do not have a description will be denied the Gold level.

Level Max call/month Max call/minute

Bronze 10,000 30

Silver 500 000 60

Gold As needed

Our goal is that everyone who wants data should have access to it, and that it should remain free of charge. The limitations we have today are because we want to be able to guarantee that our technology works for as many people as possible. Before upgrading a service, we want a forecast of how much traffic the service will generate in relation to how many travelers will benefit from the information. To have your account upgraded, use the upgrade button next to your key.
F
Spanish TTS Speech Dataset for Speech Synthesis
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Spanish TTS Speech Dataset for Speech Synthesis [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/tts-monolgue-spanish-spain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
The Spanish TTS Monologue Speech Dataset is a professionally curated resource built to train realistic, expressive, and production-grade text-to-speech (TTS) systems. It contains studio-recorded long-form speech by trained native Spanish voice artists, each contributing 1 to 2 hours of clean, uninterrupted monologue audio.
Unlike typical prompt-based datasets with short, isolated phrases, this collection features long-form, topic-driven monologues that mirror natural human narration. It includes content types that are directly useful for real-world applications, like audiobook-style storytelling, educational lectures, health advisories, product explainers, digital how-tos, formal announcements, and more.
All recordings are captured in professional studios using high-end equipment and under the guidance of experienced voice directors.
Recording & Audio Quality
•
Audio Format: WAV, 48 kHz, available in 16-bit, 24-bit, and 32-bit depth

•
SNR: Minimum 30 dB

•
Channel: Mono

•
Recording Duration: 20-30 minutes

•
Recording Environment: Studio-controlled, acoustically treated rooms

•
Per Speaker Volume: 1–2 hours of speech per artist

•
Quality Control: Each file is reviewed and cleaned for common acoustic issues, including: reverberation, lip smacks, mouth clicks, thumping, hissing, plosives, sibilance, background noise, static interference, clipping, and other artifacts.

Only clean, production-grade audio makes it into the final dataset.
Voice Artist Selection
All voice artists are native Spanish speakers with professional training or prior experience in narration. We ensure a diverse pool in terms of age, gender, and region to bring a balanced and rich vocal dataset.
•Artist Profile:
•Gender: Male and Female
•Age Range: 20–60 years
•Regions: Native Spanish-speaking states from Spain
•
Selection Process: All artists are screened, onboarded, and sample-approved using FutureBeeAI’s proprietary Yugo platform.

Script Quality & Coverage
Scripts are not generic or repetitive. Scripts are professionally authored by domain experts to reflect real-world use cases. They avoid redundancy and include modern vocabulary, emotional range, and phonetically rich sentence structures.
•
Word Count per Script: 3,000–5,000 words per 30-minute session

•Content Types:
•Storytelling
•Script and book reading
•Informational explainers
•Government service instructions
•E-commerce tutorials
•Motivational content
•Health & wellness guides
•Education & career advice
•
Linguistic Design: Balanced punctuation, emotional range, modern syntax, and vocabulary diversity

Transcripts & Alignment
While the script is used during the recording, we also provide post-recording updates to ensure the transcript reflects the final spoken audio. Minor edits are made to adjust for skipped or rephrased words.
•
Segmentation: Time-stamped at the sentence level, aligned to actual spoken delivery

•
Format: Available in plain text and JSON

•Post-processing:
•Corrected for
Z
Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jan 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miha Mohorčič; Aleš Simončič; Mihael Mohorčič; Andrej Hrovat (2023). Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7509279
Explore at:
Dataset updated
Jan 6, 2023
Authors
Miha Mohorčič; Aleš Simončič; Mihael Mohorčič; Andrej Hrovat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.

This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.

It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.

Related dataset

Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.

Measurement setup

The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.

The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.

The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.

Data preprocessing

The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:

PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }

Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.

When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:

{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },

where PR_data is structured as follows:

{ 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.

This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png

At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.

Folder structure

For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.

The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.

Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4

Environments description

The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.

Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania

Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.

Known dataset shortcomings

Due to technical and physical limitations, the dataset contains some identified deficiencies.

PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.

Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.

The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.

Location 1 - Piazza del Duomo - Chierici

The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.

Location 2 - Via Etnea - Piazza del Duomo

The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.

Location 3 - Via Etnea - Piazza Università

Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.

Location 4 - Piazza Università

This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.

Recognitions

The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.
F
Algerian Arabic TTS Speech Dataset for Speech Synthesis
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Algerian Arabic TTS Speech Dataset for Speech Synthesis [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/tts-monolgue-arabic-algeria
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Algeria
Dataset funded by
FutureBeeAI
Description
The Arabic TTS Monologue Speech Dataset is a professionally curated resource built to train realistic, expressive, and production-grade text-to-speech (TTS) systems. It contains studio-recorded long-form speech by trained native Arabic voice artists, each contributing 1 to 2 hours of clean, uninterrupted monologue audio.
Unlike typical prompt-based datasets with short, isolated phrases, this collection features long-form, topic-driven monologues that mirror natural human narration. It includes content types that are directly useful for real-world applications, like audiobook-style storytelling, educational lectures, health advisories, product explainers, digital how-tos, formal announcements, and more.
All recordings are captured in professional studios using high-end equipment and under the guidance of experienced voice directors.
Recording & Audio Quality
•
Audio Format: WAV, 48 kHz, available in 16-bit, 24-bit, and 32-bit depth

•
SNR: Minimum 30 dB

•
Channel: Mono

•
Recording Duration: 20-30 minutes

•
Recording Environment: Studio-controlled, acoustically treated rooms

•
Per Speaker Volume: 1–2 hours of speech per artist

•
Quality Control: Each file is reviewed and cleaned for common acoustic issues, including: reverberation, lip smacks, mouth clicks, thumping, hissing, plosives, sibilance, background noise, static interference, clipping, and other artifacts.

Only clean, production-grade audio makes it into the final dataset.
Voice Artist Selection
All voice artists are native Arabic speakers with professional training or prior experience in narration. We ensure a diverse pool in terms of age, gender, and region to bring a balanced and rich vocal dataset.
•Artist Profile:
•Gender: Male and Female
•Age Range: 20–60 years
•Regions: Native Arabic-speaking states from Algeria
•
Selection Process: All artists are screened, onboarded, and sample-approved using FutureBeeAI’s proprietary Yugo platform.

Script Quality & Coverage
Scripts are not generic or repetitive. Scripts are professionally authored by domain experts to reflect real-world use cases. They avoid redundancy and include modern vocabulary, emotional range, and phonetically rich sentence structures.
•
Word Count per Script: 3,000–5,000 words per 30-minute session

•Content Types:
•Storytelling
•Script and book reading
•Informational explainers
•Government service instructions
•E-commerce tutorials
•Motivational content
•Health & wellness guides
•Education & career advice
•
Linguistic Design: Balanced punctuation, emotional range, modern syntax, and vocabulary diversity

Transcripts & Alignment
While the script is used during the recording, we also provide post-recording updates to ensure the transcript reflects the final spoken audio. Minor edits are made to adjust for skipped or rephrased words.
•
Segmentation: Time-stamped at the sentence level, aligned to actual spoken delivery

•
Format: Available in plain text and JSON

•Post-processing:
•Corrected for disfluencies
<div
F
Norwegian TTS Speech Dataset for Speech Synthesis
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Norwegian TTS Speech Dataset for Speech Synthesis [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/tts-monolgue-norwegian-norway
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
The Norwegian TTS Monologue Speech Dataset is a professionally curated resource built to train realistic, expressive, and production-grade text-to-speech (TTS) systems. It contains studio-recorded long-form speech by trained native Norwegian voice artists, each contributing 1 to 2 hours of clean, uninterrupted monologue audio.
Unlike typical prompt-based datasets with short, isolated phrases, this collection features long-form, topic-driven monologues that mirror natural human narration. It includes content types that are directly useful for real-world applications, like audiobook-style storytelling, educational lectures, health advisories, product explainers, digital how-tos, formal announcements, and more.
All recordings are captured in professional studios using high-end equipment and under the guidance of experienced voice directors.
Recording & Audio Quality
•
Audio Format: WAV, 48 kHz, available in 16-bit, 24-bit, and 32-bit depth

•
SNR: Minimum 30 dB

•
Channel: Mono

•
Recording Duration: 20-30 minutes

•
Recording Environment: Studio-controlled, acoustically treated rooms

•
Per Speaker Volume: 1–2 hours of speech per artist

•
Quality Control: Each file is reviewed and cleaned for common acoustic issues, including: reverberation, lip smacks, mouth clicks, thumping, hissing, plosives, sibilance, background noise, static interference, clipping, and other artifacts.

Only clean, production-grade audio makes it into the final dataset.
Voice Artist Selection
All voice artists are native Norwegian speakers with professional training or prior experience in narration. We ensure a diverse pool in terms of age, gender, and region to bring a balanced and rich vocal dataset.
•Artist Profile:
•Gender: Male and Female
•Age Range: 20–60 years
•Regions: Native Norwegian-speaking states from Norway
•
Selection Process: All artists are screened, onboarded, and sample-approved using FutureBeeAI’s proprietary Yugo platform.

Script Quality & Coverage
Scripts are not generic or repetitive. Scripts are professionally authored by domain experts to reflect real-world use cases. They avoid redundancy and include modern vocabulary, emotional range, and phonetically rich sentence structures.
•
Word Count per Script: 3,000–5,000 words per 30-minute session

•Content Types:
•Storytelling
•Script and book reading
•Informational explainers
•Government service instructions
•E-commerce tutorials
•Motivational content
•Health & wellness guides
•Education & career advice
•
Linguistic Design: Balanced punctuation, emotional range, modern syntax, and vocabulary diversity

Transcripts & Alignment
While the script is used during the recording, we also provide post-recording updates to ensure the transcript reflects the final spoken audio. Minor edits are made to adjust for skipped or rephrased words.
•
Segmentation: Time-stamped at the sentence level, aligned to actual spoken delivery

•
Format: Available in plain text and JSON

•Post-processing:
•Corrected for

Facebook

Twitter

Click to copy link

Link copied

Cite

Office for National Statistics (2025). Deaths registered weekly in England and Wales, provisional [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/weeklyprovisionalfiguresondeathsregisteredinenglandandwales

Deaths registered weekly in England and Wales, provisional

Explore at:

143 scholarly articles cite this dataset (View in Google Scholar)

xlsxAvailable download formats

Dataset updated

Nov 26, 2025

Dataset provided by

Office for National Statisticshttp://www.ons.gov.uk/

License

Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically

Description

Provisional counts of the number of deaths registered in England and Wales, by age, sex, region and Index of Multiple Deprivation (IMD), in the latest weeks for which data are available.

Clear search

Close search

Google apps

Main menu

Deaths registered weekly in England and Wales, provisional

Deaths, by month

#IndiaNeedsOxygen Tweets

India marks one COVID-19 death every 5 minutes

Content

Dataset

Acknowledgements

Inspiration

Research Data (Health & Environmental).xlsx

PMData

Data for: World's human migration patterns in 2000-2019 unveiled by...

Pedestrian Counting System - Past Hour (counts per minute)

do you have a minute to get onchain? Price Prediction Data

Pedestrian Counting System (counts per hour)

SL Närliggande hållplatser 2

Social networks predict the life and death of honey bees - Data

Data from: HRV-ACC: a dataset with R-R intervals and accelerometer data for...

Percentage within a 15-minute walk of their nearest library - WMCA

Audio Commons Ground Truth Data for deliverables D4.4, D4.10 and D4.12

Mobile_usage_dataset_individual_person

Do you know?

What it consists of?

What we can do?

SL Reseplanerare 3.1

Spanish TTS Speech Dataset for Speech Synthesis

Recording & Audio Quality

Voice Artist Selection

Script Quality & Coverage

Transcripts & Alignment

Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...

Algerian Arabic TTS Speech Dataset for Speech Synthesis

Recording & Audio Quality

Voice Artist Selection

Script Quality & Coverage

Transcripts & Alignment

Norwegian TTS Speech Dataset for Speech Synthesis

Recording & Audio Quality

Voice Artist Selection

Script Quality & Coverage

Transcripts & Alignment

Deaths registered weekly in England and Wales, provisionalSee More Versions

Deaths registered weekly in England and Wales, provisional