49 datasets found
  1. Daily Global Trends - Insights on Popularity

    • kaggle.com
    zip
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Daily Global Trends - Insights on Popularity [Dataset]. https://www.kaggle.com/datasets/thedevastator/daily-global-trends-2020-insights-on-popularity
    Explore at:
    zip(28034217 bytes)Available download formats
    Dataset updated
    Jan 16, 2023
    Authors
    The Devastator
    Description

    Daily Global Trends - Insights on Popularity

    Analyzing Crowd Behaviour and Buzz Worldwide

    By Jeffrey Mvutu Mabilama [source]

    About this dataset

    This dataset provides a comprehensive look into 2020’s top trends worldwide, with information on the hottest topics and conversations happening all around the globe. With details such as trending type, country origin, dates of interest, URLs to find further information, keywords related to the trend and more - it's an invaluable insight into what's driving society today.

    You can use this data in conjunction with other sources to get ideas for businesses or products tailored to popular desires or opinions. If you are interested in international business perspectives then this is also your go-to source; you can adjust how best to interact with people from certain countries upon learning what they hold important in terms of search engine activity.

    It also gives key insights into buzz formation by monitoring trends over many countries over different periods of time then analysing whether events tend to last longer or if their effect is short-lived and how much impact it made in terms column ‘traffic’ – number of searches for an individual topic – for the duration of its period affecting higher positions and opinion polls. In addition, marketing / advertising professionals can anticipate what content is likely best received by audiences based off previous trends related images/snippets provided with each trend/topic as well as URL links tracking users who have shown interest.. This way they become better prepared when rolling out campaigns targeted at specific regions/areas taking cultural perspective into consideration rather than just raw numbers.

    Last but not least it serves perfectly as great starting material when getting acquainted foreigners online (at least we know what conversation starters won't be awkward mentioned!) before deepening our empathetic understanding like terms used largely solely within cultures such as TV program titles… So…… question is: What will be next big thing? See for yourself.

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to use this dataset for Insights on Popularity?

    This Daily Global Trends 2020 dataset provides valuable information about trends around the world, including insights on their popularity. It can be used to identify popular topics and find ways to capitalize on them through marketing, business ideas and more. Below are some tips for how to use this data in order to gain insight into global trends and the level of popularity they have.

    • For Business Ideas: Use the URL information provided in order to research each individual trend, analyzing both when it gained traction as well as when its popularity faded away (if at all). This will give insight into transforming a brief trend into a long-lived one or making use of an existing but brief surge in interest – think new apps related to a trending topic! Combining the geographic region listed with these timeframes gives even more granular insight that could be used for product localization or regional target marketing.

    • To study Crowd Behaviour & Dynamics: Explore both country-wise and globally trending topics by looking at which countries similarly exhibit interest levels for said topics. Go further by understanding what drives people’s interest in particular subjects from different countries; here web scraping techniques can be employed using the URLs provided accompanied with basic text analysis techniques such as word clouds! This allows researchers/marketers get better feedback from customers from multiple regions, enabling smarter decisions based upon real behaviour rather than assumptions.

    • For **Building Better Products & Selling Techniques: Utilize combine Category (Business, Social etc.), Country and Related keywords mentioned with traffic figures so that you can obtain granular information about what excites people across cultures i.e ‘Food’ is popular everywhere but certain variations depending upon geo-location may not sell due need catering towards local taste buds.-For example selling frozen food that requires little preparation via supermarket chains showing parallels between nutritional requirements vs expenses incurred while shopping will drive effective sales strategy using this data set . Further combining date information also helps make predictions based upon buyers behaviour over seasons i.e buying seedless watermelons during winter season would be futile .

    • For Social & Small Talk opportunities - Incorporating recently descr...

  2. s

    Coronavirus (COVID-19) Mobility Report - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jul 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Coronavirus (COVID-19) Mobility Report - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/coronavirus-covid-19-mobility-report
    Explore at:
    Dataset updated
    Jul 10, 2020
    Description

    Due to changes in the collection and availability of data on COVID-19, this website will no longer be updated. The webpage will no longer be available as of 11 May 2023. On-going, reliable sources of data for COVID-19 are available via the COVID-19 dashboard and the UKHSA GLA Covid-19 Mobility Report Since March 2020, London has seen many different levels of restrictions - including three separate lockdowns and many other tiers/levels of restrictions, as well as easing of restrictions and even measures to actively encourage people to go to work, their high streets and local restaurants. This reports gathers data from a number of sources, including google, apple, citymapper, purple wifi and opentable to assess the extent to which these levels of restrictions have translated to a reductions in Londoners' movements. The data behind the charts below come from different sources. None of these data represent a direct measure of how well people are adhering to the lockdown rules - nor do they provide an exhaustive data set. Rather, they are measures of different aspects of mobility, which together, offer an overall impression of how people Londoners are moving around the capital. The information is broken down by use of public transport, pedestrian activity, retail and leisure, and homeworking. Public Transport For the transport measures, we have included data from google, Apple, CityMapper and Transport for London. They measure different aspects of public transport usage - depending on the data source. Each of the lines in the chart below represents a percentage of a pre-pandemic baseline. activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Citymapper Citymapper mobility index 2021-09-05 Compares trips planned and trips taken within its app to a baseline of the four weeks from 6 Jan 2020 7.9% 28% 19% Google Google Mobility Report 2022-10-15 Location data shared by users of Android smartphones, compared time and duration of visits to locations to the median values on the same day of the week in the five weeks from 3 Jan 2020 20.4% 40% 27% TfL Bus Transport for London 2022-10-30 Bus journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 34% 24% TfL Tube Transport for London 2022-10-30 Tube journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 30% 21% Pedestrian activity With the data we currently have it's harder to estimate pedestrian activity and high street busyness. A few indicators can give us information on how people are making trips out of the house: activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Walking Apple Mobility Index 2021-11-09 estimates the frequency of trips made on foot compared to baselie of 13 Jan '20 22% 47% 36% Parks Google Mobility Report 2022-10-15 Frequency of trips to parks. Changes in the weather mean this varies a lot. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail & Rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail and recreation In this section, we focus on estimated footfall to shops, restaurants, cafes, shopping centres and so on. activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Grocery/pharmacy Google Mobility Report 2022-10-15 Estimates frequency of trips to grovery shops and pharmacies. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Retail/rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Restaurants OpenTable State of the Industry 2022-02-19 London restaurant bookings made through OpenTable 0% 0.17% 0.024% Home Working The Google Mobility Report estimates changes in how many people are staying at home and going to places of work compared to normal. It's difficult to translate this into exact percentages of the population, but changes back towards ‘normal' can be seen to start before any lockdown restrictions were lifted. This value gives a seven day rolling (mean) average to avoid it being distorted by weekends and bank holidays. name Source Latest Baseline Min/max value in Lockdown 1 Min/max value in Lockdown 2 Min/max value in Lockdown 3 Residential Google Mobility Report 2022-10-15 Estimates changes in how many people are staying at home for work. Compared to baseline of 5 weeks from 3 Jan '20 131% 119% 125% Workplaces Google Mobility Report 2022-10-15 Estimates changes in how many people are going to places of work. Compared to baseline of 5 weeks from 3 Jan '20 24% 54% 40% Restriction Date end_date Average Citymapper Average homeworking Work from home advised 17 Mar '20 21 Mar '20 57% 118% Schools, pubs closed 21 Mar '20 24 Mar '20 34% 119% UK enters first lockdown 24 Mar '20 10 May '20 10% 130% Some workers encouraged to return to work 10 May '20 01 Jun '20 15% 125% Schools open, small groups outside 01 Jun '20 15 Jun '20 19% 122% Non-essential businesses re-open 15 Jun '20 04 Jul '20 24% 120% Hospitality reopens 04 Jul '20 03 Aug '20 34% 115% Eat out to help out scheme begins 03 Aug '20 08 Sep '20 44% 113% Rule of 6 08 Sep '20 24 Sep '20 53% 111% 10pm Curfew 24 Sep '20 15 Oct '20 51% 112% Tier 2 (High alert) 15 Oct '20 05 Nov '20 49% 113% Second Lockdown 05 Nov '20 02 Dec '20 31% 118% Tier 2 (High alert) 02 Dec '20 19 Dec '20 45% 115% Tier 4 (Stay at home advised) 19 Dec '20 05 Jan '21 22% 124% Third Lockdown 05 Jan '21 08 Mar '21 22% 122% Roadmap 1 08 Mar '21 29 Mar '21 29% 118% Roadmap 2 29 Mar '21 12 Apr '21 36% 117% Roadmap 3 12 Apr '21 17 May '21 51% 113% Roadmap out of lockdown: Step 3 17 May '21 19 Jul '21 65% 109% Roadmap out of lockdown: Step 4 19 Jul '21 07 Nov '22 68% 107%

  3. Z

    Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jan 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miha Mohorčič; Aleš Simončič; Mihael Mohorčič; Andrej Hrovat (2023). Dataset of IEEE 802.11 probe requests from an uncontrolled urban environment [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7509279
    Explore at:
    Dataset updated
    Jan 6, 2023
    Authors
    Miha Mohorčič; Aleš Simončič; Mihael Mohorčič; Andrej Hrovat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.

    This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.

    It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.

    Related dataset

    Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.

    Measurement setup

    The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.

    The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.

    The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.

    Data preprocessing

    The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:

    PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }

    Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
    Missing IE fields in the captured PR are not included in PR_IE_DATA.

    When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:

    {'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },

    where PR_data is structured as follows:

    { 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.

    This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png

    At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.

    Folder structure

    For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.

    The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.

    Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4

    Environments description

    The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.

    Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania

    Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.

    Known dataset shortcomings

    Due to technical and physical limitations, the dataset contains some identified deficiencies.

    PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.

    Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.

    The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.

     Location 1 - Piazza del Duomo - Chierici
    

    The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.

     Location 2 - Via Etnea - Piazza del Duomo
    

    The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.

     Location 3 - Via Etnea - Piazza Università
    

    Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.

     Location 4 - Piazza Università
    

    This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.

    Recognitions

    The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.

  4. Store Sales - T.S Forecasting...Merged Dataset

    • kaggle.com
    zip
    Updated Dec 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shramana Bhattacharya (2021). Store Sales - T.S Forecasting...Merged Dataset [Dataset]. https://www.kaggle.com/shramanabhattacharya/store-sales-ts-forecastingmerged-dataset
    Explore at:
    zip(2847585 bytes)Available download formats
    Dataset updated
    Dec 15, 2021
    Authors
    Shramana Bhattacharya
    Description

    This dataset is a merged dataset created from the data provided in the competition "Store Sales - Time Series Forecasting". The other datasets that were provided there apart from train and test (for example holidays_events, oil, stores, etc.) could not be used in the final prediction. According to my understanding, through the EDA of the merged dataset, we will be able to get a clearer picture of the other factors that might also affect the final prediction of grocery sales. Therefore, I created this merged dataset and posted it here for the further scope of analysis.

    ##### Data Description Data Field Information (This is a copy of the description as provided in the actual dataset)

    Train.csv - id: store id - date: date of the sale - store_nbr: identifies the store at which the products are sold. -**family**: identifies the type of product sold. - sales: gives the total sales for a product family at a particular store at a given date. Fractional values are possible since products can be sold in fractional units (1.5 kg of cheese, for instance, as opposed to 1 bag of chips). - onpromotion: gives the total number of items in a product family that were being promoted at a store on a given date. - Store metadata, including ****city, state, type, and cluster.**** - cluster is a grouping of similar stores. - Holidays and Events, with metadata NOTE: Pay special attention to the transferred column. A holiday that is transferred officially falls on that calendar day but was moved to another date by the government. A transferred day is more like a normal day than a holiday. To find the day that it was celebrated, look for the corresponding row where the type is Transfer. For example, the holiday Independencia de Guayaquil was transferred from 2012-10-09 to 2012-10-12, which means it was celebrated on 2012-10-12. Days that are type Bridge are extra days that are added to a holiday (e.g., to extend the break across a long weekend). These are frequently made up by the type Work Day which is a day not normally scheduled for work (e.g., Saturday) that is meant to pay back the Bridge. Additional holidays are days added to a regular calendar holiday, for example, as typically happens around Christmas (making Christmas Eve a holiday). - dcoilwtico: Daily oil price. Includes values during both the train and test data timeframes. (Ecuador is an oil-dependent country and its economic health is highly vulnerable to shocks in oil prices.)

    **Note: ***There is a transaction column in the training dataset which displays the sales transactions on that particular date. * Test.csv - The test data, having the same features like the training data. You will predict the target sales for the dates in this file. - The dates in the test data are for the 15 days after the last date in the training data. **Note: ***There is a no transaction column in the test dataset as was there in the training dataset. Therefore, while building the model, you might exclude this column and may use it only for EDA.*

    submission.csv - A sample submission file in the correct format.

  5. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • zenodo.org
    • data.europa.eu
    zip
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema 

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    {
      _id: 
  6. ComplexVAD Video Anomaly Detection Dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Furkan Mumcu; Furkan Mumcu; Mike Jones; Mike Jones; Anoop Cherian; Anoop Cherian; Yasin Yilmaz; Yasin Yilmaz (2024). ComplexVAD Video Anomaly Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.11475281
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Furkan Mumcu; Furkan Mumcu; Mike Jones; Mike Jones; Anoop Cherian; Anoop Cherian; Yasin Yilmaz; Yasin Yilmaz
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Introduction

    The ComplexVAD dataset consists of 104 training and 113 testing video sequences taken from a static camera looking at a scene of a two-lane street with sidewalks on either side of the street and another sidewalk going across the street at a crosswalk. The videos were collected over a period of a few months on the campus of the University of South Florida using a camcorder with 1920 x 1080 pixel resolution. Videos were collected at various times during the day and on each day of the week. Videos vary in duration with most being about 12 minutes long. The total duration of all training and testing videos is a little over 34 hours. The scene includes cars, buses and golf carts driving in two directions on the street, pedestrians walking and jogging on the sidewalks and crossing the street, people on scooters, skateboards and bicycles on the street and sidewalks, and cars moving in the parking lot in the background. Branches of a tree also move at the top of many frames.

    The 113 testing videos have a total of 118 anomalous events consisting of 40 different anomaly types.

    Ground truth annotations are provided for each testing video in the form of bounding boxes around each anomalous event in each frame. Each bounding box is also labeled with a track number, meaning each anomalous event is labeled as a track of bounding boxes. A single frame can have more than one anomaly labeled.

    At a Glance

    • The size of the unzipped dataset is ~39GB
    • The dataset consists of Train sequences (containing only videos with normal activity), Test sequences (containing some anomalous activity), a ground truth annotation file for each Test sequence, and a README.md file describing the data organization and ground truth annotation format.
    • The zip files contain a Train directory, a Test directory, an annotations directory, and a README.md file.

    License

    The ComplexVAD dataset is released under CC-BY-SA-4.0 license.

    All data:

    Created by Mitsubishi Electric Research Laboratories (MERL), 2024
    
    SPDX-License-Identifier: CC-BY-SA-4.0
  7. w

    Afrobarometer Survey 1 1999-2000, Merged 7 Country - Botswana, Lesotho,...

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Apr 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institute for Democracy in South Africa (IDASA) (2021). Afrobarometer Survey 1 1999-2000, Merged 7 Country - Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia, Zimbabwe [Dataset]. https://microdata.worldbank.org/index.php/catalog/889
    Explore at:
    Dataset updated
    Apr 27, 2021
    Dataset provided by
    Ghana Centre for Democratic Development (CDD-Ghana)
    Michigan State University (MSU)
    Institute for Democracy in South Africa (IDASA)
    Time period covered
    1999 - 2000
    Area covered
    Malawi, Africa, Botswana, Lesotho, South Africa, Zambia, Zimbabwe, Namibia
    Description

    Abstract

    Round 1 of the Afrobarometer survey was conducted from July 1999 through June 2001 in 12 African countries, to solicit public opinion on democracy, governance, markets, and national identity. The full 12 country dataset released was pieced together out of different projects, Round 1 of the Afrobarometer survey,the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.

    The 7 country dataset is a subset of the Round 1 survey dataset, and consists of a combined dataset for the 7 Southern African countries surveyed with other African countries in Round 1, 1999-2000 (Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe). It is a useful dataset because, in contrast to the full 12 country Round 1 dataset, all countries in this dataset were surveyed with the identical questionnaire

    Geographic coverage

    Botswana Lesotho Malawi Namibia South Africa Zambia Zimbabwe

    Analysis unit

    Basic units of analysis that the study investigates include: individuals and groups

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A new sample has to be drawn for each round of Afrobarometer surveys. Whereas the standard sample size for Round 3 surveys will be 1200 cases, a larger sample size will be required in societies that are extremely heterogeneous (such as South Africa and Nigeria), where the sample size will be increased to 2400. Other adaptations may be necessary within some countries to account for the varying quality of the census data or the availability of census maps.

    The sample is designed as a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of selection for interview. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible. A randomly selected sample of 1200 cases allows inferences to national adult populations with a margin of sampling error of no more than plus or minus 2.5 percent with a confidence level of 95 percent. If the sample size is increased to 2400, the confidence interval shrinks to plus or minus 2 percent.

    Sample Universe

    The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.

    What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.

    Sample Design

    The sample design is a clustered, stratified, multi-stage, area probability sample.

    To repeat the main sampling principle, the objective of the design is to give every sample element (i.e. adult citizen) an equal and known chance of being chosen for inclusion in the sample. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible.

    In a series of stages, geographically defined sampling units of decreasing size are selected. To ensure that the sample is representative, the probability of selection at various stages is adjusted as follows:

    The sample is stratified by key social characteristics in the population such as sub-national area (e.g. region/province) and residential locality (urban or rural). The area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. And the urban/rural stratification is a means to make sure that these localities are represented in their correct proportions. Wherever possible, and always in the first stage of sampling, random sampling is conducted with probability proportionate to population size (PPPS). The purpose is to guarantee that larger (i.e., more populated) geographical units have a proportionally greater probability of being chosen into the sample. The sampling design has four stages

    A first-stage to stratify and randomly select primary sampling units;

    A second-stage to randomly select sampling start-points;

    A third stage to randomly choose households;

    A final-stage involving the random selection of individual respondents

    We shall deal with each of these stages in turn.

    STAGE ONE: Selection of Primary Sampling Units (PSUs)

    The primary sampling units (PSU's) are the smallest, well-defined geographic units for which reliable population data are available. In most countries, these will be Census Enumeration Areas (or EAs). Most national census data and maps are broken down to the EA level. In the text that follows we will use the acronyms PSU and EA interchangeably because, when census data are employed, they refer to the same unit.

    We strongly recommend that NIs use official national census data as the sampling frame for Afrobarometer surveys. Where recent or reliable census data are not available, NIs are asked to inform the relevant Core Partner before they substitute any other demographic data. Where the census is out of date, NIs should consult a demographer to obtain the best possible estimates of population growth rates. These should be applied to the outdated census data in order to make projections of population figures for the year of the survey. It is important to bear in mind that population growth rates vary by area (region) and (especially) between rural and urban localities. Therefore, any projected census data should include adjustments to take such variations into account.

    Indeed, we urge NIs to establish collegial working relationships within professionals in the national census bureau, not only to obtain the most recent census data, projections, and maps, but to gain access to sampling expertise. NIs may even commission a census statistician to draw the sample to Afrobarometer specifications, provided that provision for this service has been made in the survey budget.

    Regardless of who draws the sample, the NIs should thoroughly acquaint themselves with the strengths and weaknesses of the available census data and the availability and quality of EA maps. The country and methodology reports should cite the exact census data used, its known shortcomings, if any, and any projections made from the data. At minimum, the NI must know the size of the population and the urban/rural population divide in each region in order to specify how to distribute population and PSU's in the first stage of sampling. National investigators should obtain this written data before they attempt to stratify the sample.

    Once this data is obtained, the sample population (either 1200 or 2400) should be stratified, first by area (region/province) and then by residential locality (urban or rural). In each case, the proportion of the sample in each locality in each region should be the same as its proportion in the national population as indicated by the updated census figures.

    Having stratified the sample, it is then possible to determine how many PSU's should be selected for the country as a whole, for each region, and for each urban or rural locality.

    The total number of PSU's to be selected for the whole country is determined by calculating the maximum degree of clustering of interviews one can accept in any PSU. Because PSUs (which are usually geographically small EAs) tend to be socially homogenous we do not want to select too many people in any one place. Thus, the Afrobarometer has established a standard of no more than 8 interviews per PSU. For a sample size of 1200, the sample must therefore contain 150 PSUs/EAs (1200 divided by 8). For a sample size of 2400, there must be 300 PSUs/EAs.

    These PSUs should then be allocated proportionally to the urban and rural localities within each regional stratum of the sample. Let's take a couple of examples from a country with a sample size of 1200. If the urban locality of Region X in this country constitutes 10 percent of the current national population, then the sample for this stratum should be 15 PSUs (calculated as 10 percent of 150 PSUs). If the rural population of Region Y constitutes 4 percent of the current national population, then the sample for this stratum should be 6 PSU's.

    The next step is to select particular PSUs/EAs using random methods. Using the above example of the rural localities in Region Y, let us say that you need to pick 6 sample EAs out of a census list that contains a total of 240 rural EAs in Region Y. But which 6? If the EAs created by the national census bureau are of equal or roughly equal population size, then selection is relatively straightforward. Just number all EAs consecutively, then make six selections using a table of random numbers. This procedure, known as simple random sampling (SRS), will

  8. MobMeter: a global mobility data set based on smartphone trajectories

    • zenodo.org
    csv
    Updated Jun 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesco Finazzi; Francesco Finazzi (2023). MobMeter: a global mobility data set based on smartphone trajectories [Dataset]. http://doi.org/10.5281/zenodo.6984638
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Francesco Finazzi; Francesco Finazzi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set is supplement to this Scientific Reports article.

    The data set provides estimates of country-level daily mobility metrics (uncertainty included) for 17 countries from March 11, 2020 to present. Estimates are based on more than 3.8 million smartphone trajectories.

    • Metrics:
      • Estimated daily average travelled distance by people.
      • Estimated percentage of people who did not move during the 24 hours of the day.
    • Countries: Argentina (ARG), Chile (CHL), Colombia (COL), Costa Rica (CRI), Ecuador (ECU), Greece (GRC), Guatemala (GTM), Italy (ITA), Mexico (MEX), Nicaragua (NIC), Panama (PAN), Peru (PER), Philippines (PHL), Slovenia (SVN), Turkey (TUR), United States (USA) and Venezuela (VEN).
    • Covered period: from March 11, 2020 to present.
    • Temporal resolution: daily.
    • Temporal smoothing:
      • No smoothing.
      • 7-day moving average.
      • 14-day moving average.
      • 21-day moving average.
      • 28-day moving average.
    • Uncertainty: 95% bootstrap confidence interval.

    Data ownership

    Anonymized data on smartphone trajectories are collected, owned and managed by Futura Innovation SRL. Smartphone trajectories are stored and analyzed on servers owned by Futura Innovation SRL and not shared with third parties, including the author of this repository and his organization (University of Bergamo).

    Contribution

    • Ilaria Cremonesi of Futura Innovation SRL is the data owner and data manager.
    • Francesco Finazzi of University of Bergamo developed the statistical methodology for the data analysis and the algorithms implemented on Futura Innovation SRL servers.

    Repository update

    CSV files of this repository are regularly produced by Futura Innovation SRL and published by the repository's author after validation.

  9. r

    Data from: Moving around the city

    • researchdata.edu.au
    • data.nsw.gov.au
    • +1more
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Sydney (2025). Moving around the city [Dataset]. https://researchdata.edu.au/moving-city/2295192
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    data.nsw.gov.au
    Authors
    City of Sydney
    Description

    Every day 1.3 million people live, work, study, do business, shop and go out in our local area. All of these people have an interest in the future of Sydney and can have their say on this plan. While 2050 may seem a long time away, we need to plan now if we are to meet the ongoing and future needs of our communities. For more information on planning for Sydney 2050, visit the City of Sydney website.

  10. HackCambridge Resources - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Nov 3, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2015). HackCambridge Resources - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/hackcambridge-resources
    Explore at:
    Dataset updated
    Nov 3, 2015
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Can you help #hackCambridge to tackle the city’s challenges? Cambridge’s brainpower is being challenged to take part in a 24hr Hackathon to find ways of using technology to tackle the pressures facing the city as it grows. Plans for over 33,000 houses to be built over the next 15 years will see an additional 50,000 people move into Cambridge and the surrounding area presenting an unprecedented challenge for the city. Teams of hackers will work through the night using their tech skills and the city’s data to develop new ways to address the challenges to mobility, environment, heath and social care. The next morning they’ll pitch their ideas to a panel of experts who will award prizes for the best solutions. The hackathon is part of #hackCambridge - a day of talks, workshops and activities being held at The Junction, Cambridge from 12 noon on Saturday 31 October to explore how technology and creative thinking can help to improve city life. And it’s not just for techies – local residents can get involved through a workshop for non-techies, meet their mobile phone Data Shadow commissioned by Collusion and developed by multi-media artist Mark Farid, see Cambridge built in Minecraft and short films about the city’s inventors.

  11. Crimes - One year prior to present

    • chicago.gov
    • data.cityofchicago.org
    • +2more
    csv, xlsx, xml
    Updated Nov 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chicago Police Department (2025). Crimes - One year prior to present [Dataset]. https://www.chicago.gov/city/en/dataset/crime.html
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Nov 24, 2025
    Dataset authored and provided by
    Chicago Police Departmenthttp://chicagopolice.org/
    Description

    This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that have occurred in the City of Chicago over the past year, minus the most recent seven days of data. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited.

    The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://bit.ly/rk5Tpc.

  12. Public Toilet Locations in Causeway Coast and Glens - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jun 22, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2018). Public Toilet Locations in Causeway Coast and Glens - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/public-toilet-locations-in-causeway-coast-and-glens
    Explore at:
    Dataset updated
    Jun 22, 2018
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Causeway Coast and Glens
    Description

    Location of all Public Toilets within Causeway Coast and Glens Borough Council Area.The Estates Service is responsible for the operation and upkeep of public conveniences. These facilities can be view on the map below. If you have any queries regarding opening hours or facilities, please contact: Environmental Services Tel: 028 2766 0200 Email info@causewaycoastandglens.gov.uk.For many disabled people the availability of wheelchair accessible toilets is essential, if they are to go out and about and participate within the social environment. The provision and availability of wheelchair accessible public toilets is also critical. Local Authorities are tasked with providing public toilets throughout the UK. The provision of public toilets also includes the maintenance of the buildings and the cleanliness. In carrying out this function Councils encounter vandalism on a daily basis. This disrupts the service and the availability of public toilets. In order to prevent vandalism and misuse of wheelchair accessible public toilets many Local Authorities have adopted the National Key Scheme (N.K.S.).The Local Authority or organisation ensures keys are available to disabled persons who may wish to use these toilets. The National Key Scheme (N.K.S.) is recognised through the UK and Europe.Such keys may be purchased at a cost of £3.75 from:Environmental Services DepartmentColeraine Office, Cloonavin, 66 Portstewart Road, Coleraine BT52 1EY Tel: 028 70347272Ballymoney Office, Riada House, 14 Charles Street, Ballymoney BT53 6DZ Tel: 028 2766 0200Limavady Office, 7 Connell Street, Limavady BT49 0HA Tel: 028 7772 2226ORR.A.D.A.R, 12 City Forum, London, EV1V 8AS Tel: 020 7250 3222

  13. 2

    Labour Force Survey Longitudinal Datasets, April 2001- : Secure Access

    • datacatalogue.ukdataservice.ac.uk
    Updated Aug 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UK Data Service (2024). Labour Force Survey Longitudinal Datasets, April 2001- : Secure Access [Dataset]. http://doi.org/10.5255/UKDA-SN-7908-19
    Explore at:
    Dataset updated
    Aug 22, 2024
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Time period covered
    Mar 31, 2001 - Mar 31, 2023
    Area covered
    United Kingdom
    Description

    Background
    The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

    Longitudinal data
    The LFS retains each sample household for five consecutive quarters, with a fifth of the sample replaced each quarter. The main survey was designed to produce cross-sectional data, but the data on each individual have now been linked together to provide longitudinal information. The longitudinal data comprise two types of linked datasets, created using the weighting method to adjust for non-response bias. The two-quarter datasets link data from two consecutive waves, while the five-quarter datasets link across a whole year (for example January 2010 to March 2011 inclusive) and contain data from all five waves. Linking together records to create a longitudinal dimension can, for example, provide information on gross flows over time between different labour force categories (employed, unemployed and economically inactive). This will provide detail about people who have moved between the categories. Also, longitudinal information is useful in monitoring the effects of government policies and can be used to follow the subsequent activities and circumstances of people affected by specific policy initiatives, and to compare them with other groups in the population. There are however methodological problems which could distort the data resulting from this longitudinal linking. The ONS continues to research these issues and advises that the presentation of results should be carefully considered, and warnings should be included with outputs where necessary.

    Secure Access data
    Secure Access longitudinal datasets for the LFS are available for two-quarters (SN 7908) and five-quarters (SN 7909). The two-quarter datasets are available from April 2001 and the five-quarter datasets are available from June 2010. The Secure Access versions include additional, detailed variables not included in the standard 'End User Licence' (EUL) longitudinal datasets (see under GNs 33315 and 33316).

    Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to:

    • day, month and year of birth
    • standard occupational classification (SOC) relating to second job, job made redundant from, last job, apprenticeships and occupation one year ago
    • five digit industry subclass relating to main job, last job, second job and job one year ago
    These extra variables are not available for every quarter or dataset. Users are advised to consult the 'LFS Variable Catalogue' file available in the Documentation section below for further information.

    Occupation data for 2021 and 2022 data files

    The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

    2022 Weighting

    The population totals used for the latest LFS estimates use projected growth rates from Real Time Information (RTI) data for UK, EU and non-EU populations based on 2021 patterns. The total population used for the LFS therefore does not take into account any changes in migration, birth rates, death rates, and so on since June 2021, and hence levels estimates may be under- or over-estimating the true values and should be used with caution. Estimates of rates will, however, be robust.

    Documentation
    The study documentation presented in the Documentation section includes data dictionaries for all years, and the most recent LFS documentation only, due to available space. Documentation for previous years is provided alongside the data for access and is also available upon request.

    Latest edition information
    For the 19th edition (August 2023), the data files for 2021 and 2022 have been replaced with new versions which include revised Standard Occupational Classification (SOC) variables. Further information can be found in the ONS article published on 11 July 2023, "Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022", available with the study documentation.

  14. f

    20 Richest Counties in Florida

    • florida-demographics.com
    Updated Jun 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristen Carney (2024). 20 Richest Counties in Florida [Dataset]. https://www.florida-demographics.com/counties_by_population
    Explore at:
    Dataset updated
    Jun 20, 2024
    Dataset provided by
    Cubit Planning, Inc.
    Authors
    Kristen Carney
    License

    https://www.florida-demographics.com/terms_and_conditionshttps://www.florida-demographics.com/terms_and_conditions

    Area covered
    Florida
    Description

    A dataset listing Florida counties by population for 2024.

  15. zomato_indore data

    • kaggle.com
    zip
    Updated Feb 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sakshi (2020). zomato_indore data [Dataset]. https://www.kaggle.com/datasets/sakshiwork/zomato-indore-data
    Explore at:
    zip(58197 bytes)Available download formats
    Dataset updated
    Feb 7, 2020
    Authors
    sakshi
    Area covered
    Indore
    Description

    Context

    I was always fascinated by the food culture of Indore. Indore is best place for foodies people visit here to explore its mouth-watering food. The number of food places are increasing day by day. Currently which stands at approximately 3,000 restaurants. This industry hasn't been saturated yet. And new restaurants are opening every day. However it has become difficult for them to compete with already established restaurants. The key issues that continue to pose a challenge to them include high real estate costs, rising food costs, shortage of quality manpower, fragmented supply chain and over-licensing. This Zomato data aims at analysing demography of the location. Most importantly it will help new restaurants in deciding their theme, menus, cuisine, cost etc for a particular location. It also aims at finding similarity between neighborhoods of Bengaluru on the basis of food. The dataset also contains reviews for each of the restaurant which will help in finding overall rating for the place

    Content

    This dataset is used to analyze different question about the food places and their popularities. These kind of analysis can be done using the data, by studying the factors such as • Location of the restaurant • Approx Price of food • Theme based restaurant or not • Which locality of that city serves that cuisines with maximum number of restaurants • The needs of people who are striving to get the best cuisine of the neighborhood • Is a particular neighborhood famous for its own kind of food.

    “Just so that you have a good meal the next time you step out”

    Acknowledgements

    The data scraped was entirely for educational purposes only. Note that I don’t claim any copyright for the data. All copyrights for the data is owned by Zomato Media Pvt. Ltd..

    Inspiration

    I was always astonished by how each of the restaurants are able to keep up the pace inspite of that cutting edge competition. And what factors should be kept in mind if someone wants to open new restaurant. Does the demography of an area matters? Does location of a particular type of restaurant also depends on the people living in that area? Does the theme of the restaurant matters? Are any neighborhood similar ? What kind of a food is more popular in a locality?

  16. US Gross Rent ACS Statistics

    • kaggle.com
    zip
    Updated Aug 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Golden Oak Research Group (2017). US Gross Rent ACS Statistics [Dataset]. https://www.kaggle.com/goldenoakresearch/acs-gross-rent-us-statistics
    Explore at:
    zip(1733269 bytes)Available download formats
    Dataset updated
    Aug 23, 2017
    Dataset authored and provided by
    Golden Oak Research Group
    Area covered
    United States
    Description

    What you get:

    Upvote! The database contains +40,000 records on US Gross Rent & Geo Locations. The field description of the database is documented in the attached pdf file. To access, all 325,272 records on a scale roughly equivalent to a neighborhood (census tract) see link below and make sure to upvote. Upvote right now, please. Enjoy!

    Get the full free database with coupon code: FreeDatabase, See directions at the bottom of the description... And make sure to upvote :) coupon ends at 2:00 pm 8-23-2017

    Gross Rent & Geographic Statistics:

    • Mean Gross Rent (double)
    • Median Gross Rent (double)
    • Standard Deviation of Gross Rent (double)
    • Number of Samples (double)
    • Square area of land at location (double)
    • Square area of water at location (double)

    Geographic Location:

    • Longitude (double)
    • Latitude (double)
    • State Name (character)
    • State abbreviated (character)
    • State_Code (character)
    • County Name (character)
    • City Name (character)
    • Name of city, town, village or CPD (character)
    • Primary, Defines if the location is a track and block group.
    • Zip Code (character)
    • Area Code (character)

    Abstract

    The data set originally developed for real estate and business investment research. Income is a vital element when determining both quality and socioeconomic features of a given geographic location. The following data was derived from over +36,000 files and covers 348,893 location records.

    License

    Only proper citing is required please see the documentation for details. Have Fun!!!

    Golden Oak Research Group, LLC. “U.S. Income Database Kaggle”. Publication: 5, August 2017. Accessed, day, month year.

    For any questions, you may reach us at research_development@goldenoakresearch.com. For immediate assistance, you may reach me on at 585-626-2965

    please note: it is my personal number and email is preferred

    Check our data's accuracy: Census Fact Checker

    Access all 325,272 location for Free Database Coupon Code:

    Don't settle. Go big and win big. Optimize your potential**. Access all gross rent records and more on a scale roughly equivalent to a neighborhood, see link below:

    A small startup with big dreams, giving the every day, up and coming data scientist professional grade data at affordable prices It's what we do.

  17. Birthday Paradox Visitor Data

    • kaggle.com
    zip
    Updated Jan 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Birthday Paradox Visitor Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/birthday-paradox-visitor-data
    Explore at:
    zip(8451 bytes)Available download formats
    Dataset updated
    Jan 22, 2023
    Authors
    The Devastator
    Description

    Birthday Paradox Visitor Data

    Exploring Probability and Patterns of Day of the Week Birthdays

    By data.world's Admin [source]

    About this dataset

    This dataset contains daily visitor-submitted birthdays and associated data from an ongoing experimentation known as the Birthday Paradox. Be enlightened as you learn how many people have chosen the same day of their birthday as yours. Get a better perspective on how this phenomenon varies day-to-day, including recent submissions within the last 24 hours. This experiment is published under the MIT License, giving you access to detailed information behind this perplexing cognitive illusion. Find out now why the probability of two people in the same room having birthday matches is much higher than one might expect!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides data on the Birthday Paradox Visitor Experiments. It contains information such as daily visitor-submitted birthdays, the total number of visitors who have submitted birthdays, the total number of visitors who guessed the same day as their birthday, and more. This dataset can be used to analyze patterns in visitor behavior related to the Birthday Paradox Experiment.

    In order to use this dataset effectively and efficiently, it is important to understand its fields and variables:
    - Updated: The date when this data was last updated
    - Count: The total number of visitors who have submitted birthdays
    - Recent: The number of visitors who have submitted birthdays in the last 24 hours
    - binnedDay: The day of the week for a given visitor's birthday submission
    - binnedGuess: The day of week that a given visitor guessed their birthday would fall on 6) Tally: Total number of visitors who guessed same day as their birthday 7) binnedTally: Total number of visitors grouped by guess day

    To begin using this dataset you should first filter your data based on desired criteria such as date range or binnedDay. For instance, if you are interested in analyzing Birthady Paradox Experiment results for Monday submissions only then you can filter your data by binnedDay = 'Monday'. Then further analyze your filtered query by examining other fields such as binnedGuess and comparing it with tally or binnedTally results accordingly. For example if we look at Monday entries above we should compare 'Monday' tallies with 'Tuesday' guesses (or any other weekday). ` Furthermore understanding updates from recent field can also provide interesting insights into user behavior related to Birthady Paradox Experiment -- trackingt recent entries may yield valuable trends over time.

    By exploring various combinations offields available in this dataset users will be ableto gain a better understandingof how user behaviordiffers across different daysofweek both within a singledayandover periodsoftimeaccordingtodifferent criteria providedbythisdataset

    Research Ideas

    • Analyzing the likelihood of whether a person will guess their own birthday correctly.
    • Estimating which day of the week is seeing the most number of visitors submitting their birthdays each day and analyzing how this varies over time.
    • Investigating how likely it is for two people from different regions to have the same birthday by comparing their respective submission rates on each day of the week

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: data.csv | Column name | Description | |:----------------|:-----------------------------------------------------------------------------------| | updated | The date and time the data was last updated. (DateTime) | | count | The total number of visitor submissions. (Integer) | | recent | The number of visitor submissions in the last 24 hours. (Integer) | | binnedDay | The day of the week the visitor submitted their birthday. (String) | | binnedGuess | The day of the week the visitor guessed their birthday. (String) | | tally | The total number of visitor guesses that matched their actual birthdays. (Integer) | | binnedTally | The day of the week the visitor guessed their birthday correctly. (String) |

    Acknowledgement...

  18. Zomato-Dataset-Exploratory-Data-Analysis

    • kaggle.com
    zip
    Updated Sep 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohit Bhadauria (2022). Zomato-Dataset-Exploratory-Data-Analysis [Dataset]. https://www.kaggle.com/datasets/mohitbhadauria/zomato-dataset-eda
    Explore at:
    zip(113952 bytes)Available download formats
    Dataset updated
    Sep 15, 2022
    Authors
    Mohit Bhadauria
    Description

    The basic idea of analyzing the Zomato dataset is to get a fair idea about the factors affecting the establishment of different types of restaurant at different places in Bengaluru, aggregate rating of each restaurant, Bengaluru being one such city has more than 12,000 restaurants with restaurants serving dishes from all over the world. With each day new restaurants opening the industry has’nt been saturated yet and the demand is increasing day by day. Inspite of increasing demand it however has become difficult for new restaurants to compete with established restaurants. Most of them serving the same food. Bengaluru being an IT capital of India. Most of the people here are dependent mainly on the restaurant food as they don’t have time to cook for themselves. With such an overwhelming demand of restaurants it has therefore become important to study the demography of a location. What kind of a food is more popular in a locality. Do the entire locality loves vegetarian food. If yes then is that locality populated by a particular sect of people for eg. Jain, Marwaris, Gujaratis who are mostly vegetarian. These kind of analysis can be done using the data, by studying the factors such as • Location of the restaurant • Approx Price of food • Theme based restaurant or not • Which locality of that city serves that cuisines with maximum number of restaurants • The needs of people who are striving to get the best cuisine of the neighborhood • Is a particular neighborhood famous for its own kind of food.

    “Just so that you have a good meal the next time you step out”

    The data is accurate to that available on the zomato website until 15 March 2019. The data was scraped from Zomato in two phase. After going through the structure of the website I found that for each neighborhood there are 6-7 category of restaurants viz. Buffet, Cafes, Delivery, Desserts, Dine-out, Drinks & nightlife, Pubs and bars.

    Phase I,

    In Phase I of extraction only the URL, name and address of the restaurant were extracted which were visible on the front page. The URl's for each of the restaurants on the zomato were recorded in the csv file so that later the data can be extracted individually for each restaurant. This made the extraction process easier and reduced the extra load on my machine. The data for each neighborhood and each category can be found here

    Phase II,

    In Phase II the recorded data for each restaurant and each category was read and data for each restaurant was scraped individually. 15 variables were scraped in this phase. For each of the neighborhood and for each category their onlineorder, booktable, rate, votes, phone, location, resttype, dishliked, cuisines, approxcost(for two people), reviewslist, menu_item was extracted. See section 5 for more details about the variables.

    Acknowledgements The data scraped was entirely for educational purposes only. Note that I don’t claim any copyright for the data. All copyrights for the data is owned by Zomato Media Pvt. Ltd..

    Inspiration I was always astonished by how each of the restaurants are able to keep up the pace inspite of that cutting edge competition. And what factors should be kept in mind if someone wants to open new restaurant. Does the demography of an area matters? Does location of a particular type of restaurant also depends on the people living in that area? Does the theme of the restaurant matters? Is a food chain category restaurant likely to have more customers than its counter part? Are any neighborhood similar ? If two neighborhood are similar does that mean these are related or particular group of people live in the neighborhood or these are the places to it? What kind of a food is more popular in a locality. Do the entire locality loves vegetarian food. If yes then is that locality populated by a particular sect of people for eg. Jain, Marwaris, Gujaratis who are mostly vegetarian. There are infacts dozens of question in my mind. lets try to find out the answer with this dataset.

    For detailed discussion of the business problem, please visit this link

    Please visit this link to find codebook cum documentation for the data

    GITHUB LINk : https://github.com/mohitbhadauria02/Zomato-Dataset-using-Exploratory-Data-Analysis.git

  19. Sample: Hand Wash Dataset

    • kaggle.com
    zip
    Updated Apr 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    real-timeAR (2020). Sample: Hand Wash Dataset [Dataset]. https://www.kaggle.com/realtimear/hand-wash-dataset
    Explore at:
    zip(1299952173 bytes)Available download formats
    Dataset updated
    Apr 19, 2020
    Authors
    real-timeAR
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    The link to the GitHub repository that contains an application of this dataset can be found here.

    This dataset is currently a sample dataset to represent the entire Hand Wash Dataset. The main intent of this dataset is to be used as a benchmark for action recognition tasks in the future. Typically, most publicly available action recognition datasets are created for generic actions performed day-today or are actions of people playing a particular sport, for example, UCF-101, UCF Sports, KTH and others.

    The actions performed in these datasets involve large movements from frame to frame, and each of these actions is very unique to each other in the sense that they are performed in widely different settings and environments. For example, the action of applying make-up is vastly different from a field hockey penalty, which are both classes in UCF101.

    When it comes to identifying each step involved in the hand wash procedure as prescribed by the World Health Organisation, small changes in hand position and changes in the environment need to be taken into account. Since the magnitude of these movements is very small, other action recognition algorithms have a hard time accurately identifying a change from one step to the next.

    Additionally, current datasets, are not very representative of a potential real-world application of action recognition, such as fixed camera position, with a mostly static background, and adequate training data. Hence, the importance of creating a new dataset was further highlighted.

    Content

    The Hand Wash Dataset consists of 292 individual videos of hand washes (with each hand wash having 12 steps, for a total of 3,504 clips), in different environments to provide as much variance as possible. The variance was important to ensure that the model is robust and can work in more than a few environments. The varied parameters are:

    • Illumination
    • Background
    • Source Camera Position
    • Field of view
    • Individuals performing the hand wash

    This was done because the Hand Wash Dataset intends to simulate the real-world constraints of a potential application of an action recognition solution such as: fixed camera position, real-time feedback, varying illumination, static background and applied to one domain-specific fine-grained action task.

    The next step was to identify each step involved in the procedure. Hence, the original 7 Steps prescribed by the WHO were further broken down into 12 actions which take into account every possible action that may be performed when a person washes their hands. The action classes are:

    1. Step 1
    2. Step 2 Left
    3. Step 2 Right
    4. Step 3
    5. Step 4 Left
    6. Step 4 Right
    7. Step 5 Left
    8. Step 5 Right
    9. Step 6 Left
    10. Step 6 Right
    11. Step 7 Left
    12. Step 7 Right
  20. Commuter Survey Dataset — Travel Time & Challenges

    • kaggle.com
    zip
    Updated Aug 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adharshini Kumaresan (2025). Commuter Survey Dataset — Travel Time & Challenges [Dataset]. https://www.kaggle.com/datasets/adharshinikumar/commuter-survey-dataset-travel-time-and-challenges
    Explore at:
    zip(3631 bytes)Available download formats
    Dataset updated
    Aug 16, 2025
    Authors
    Adharshini Kumaresan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The FlowSync Commuter Survey 2025 is designed to capture the realities of daily urban commuting in India, with a focus on travel time, route choices, flexibility, and challenges faced by commuters.

    With rising congestion, long commute hours, and the growing demand for flexible work arrangements, understanding commuter behavior is critical for urban mobility planning, corporate HR strategies, and smart transport solutions.

    This dataset contains anonymized responses from participants who shared:

    Their average daily commute time and routes

    Willingness to shift travel times to off-peak slots

    Preferred departure slots for flexibility in travel

    Incentive preferences to encourage schedule adjustments

    By combining quantitative commute metrics with qualitative preferences, this dataset provides a rare view into how people balance time, convenience, and incentives in their daily travel.

    🔑 Key Features

    Commute Duration (in minutes)

    Route Information (origin → destination)

    Travel Flexibility (willingness to shift commute time)

    Preferred Evening Departure Slots

    Incentive Preferences (transport stipends, flexible perks, etc.)

    🚀 Use Cases

    This dataset is ideal for data analysts, urban researchers, and AI/ML practitioners exploring real-world commuting behaviors. Some possible applications include:

    Urban Planning & Smart Cities

    Identifying high-congestion routes and times

    Designing policies to encourage off-peak commuting

    Transport Optimization

    Studying how incentives affect commuting flexibility

    Evaluating demand for staggered work hours

    Corporate HR & Workplace Strategy

    Supporting remote/hybrid policy decisions

    Designing incentive models for flexible office timings

    AI & Machine Learning Applications

    Predictive models for commute time under different conditions

    Clustering commuters based on behavior & preferences

    Policy & Sustainability Research

    Analyzing willingness to adopt eco-friendly modes

    Supporting strategies to reduce peak-hour congestion

    🌍 Impact

    By publishing this dataset openly, FlowSync aims to contribute towards building smarter mobility solutions that improve commuter well-being, reduce traffic load, and promote sustainable transport policies for Indian cities.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). Daily Global Trends - Insights on Popularity [Dataset]. https://www.kaggle.com/datasets/thedevastator/daily-global-trends-2020-insights-on-popularity
Organization logo

Daily Global Trends - Insights on Popularity

Analyzing Crowd Behaviour and Buzz Worldwide

Explore at:
zip(28034217 bytes)Available download formats
Dataset updated
Jan 16, 2023
Authors
The Devastator
Description

Daily Global Trends - Insights on Popularity

Analyzing Crowd Behaviour and Buzz Worldwide

By Jeffrey Mvutu Mabilama [source]

About this dataset

This dataset provides a comprehensive look into 2020’s top trends worldwide, with information on the hottest topics and conversations happening all around the globe. With details such as trending type, country origin, dates of interest, URLs to find further information, keywords related to the trend and more - it's an invaluable insight into what's driving society today.

You can use this data in conjunction with other sources to get ideas for businesses or products tailored to popular desires or opinions. If you are interested in international business perspectives then this is also your go-to source; you can adjust how best to interact with people from certain countries upon learning what they hold important in terms of search engine activity.

It also gives key insights into buzz formation by monitoring trends over many countries over different periods of time then analysing whether events tend to last longer or if their effect is short-lived and how much impact it made in terms column ‘traffic’ – number of searches for an individual topic – for the duration of its period affecting higher positions and opinion polls. In addition, marketing / advertising professionals can anticipate what content is likely best received by audiences based off previous trends related images/snippets provided with each trend/topic as well as URL links tracking users who have shown interest.. This way they become better prepared when rolling out campaigns targeted at specific regions/areas taking cultural perspective into consideration rather than just raw numbers.

Last but not least it serves perfectly as great starting material when getting acquainted foreigners online (at least we know what conversation starters won't be awkward mentioned!) before deepening our empathetic understanding like terms used largely solely within cultures such as TV program titles… So…… question is: What will be next big thing? See for yourself.

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

How to use this dataset for Insights on Popularity?

This Daily Global Trends 2020 dataset provides valuable information about trends around the world, including insights on their popularity. It can be used to identify popular topics and find ways to capitalize on them through marketing, business ideas and more. Below are some tips for how to use this data in order to gain insight into global trends and the level of popularity they have.

  • For Business Ideas: Use the URL information provided in order to research each individual trend, analyzing both when it gained traction as well as when its popularity faded away (if at all). This will give insight into transforming a brief trend into a long-lived one or making use of an existing but brief surge in interest – think new apps related to a trending topic! Combining the geographic region listed with these timeframes gives even more granular insight that could be used for product localization or regional target marketing.

  • To study Crowd Behaviour & Dynamics: Explore both country-wise and globally trending topics by looking at which countries similarly exhibit interest levels for said topics. Go further by understanding what drives people’s interest in particular subjects from different countries; here web scraping techniques can be employed using the URLs provided accompanied with basic text analysis techniques such as word clouds! This allows researchers/marketers get better feedback from customers from multiple regions, enabling smarter decisions based upon real behaviour rather than assumptions.

  • For **Building Better Products & Selling Techniques: Utilize combine Category (Business, Social etc.), Country and Related keywords mentioned with traffic figures so that you can obtain granular information about what excites people across cultures i.e ‘Food’ is popular everywhere but certain variations depending upon geo-location may not sell due need catering towards local taste buds.-For example selling frozen food that requires little preparation via supermarket chains showing parallels between nutritional requirements vs expenses incurred while shopping will drive effective sales strategy using this data set . Further combining date information also helps make predictions based upon buyers behaviour over seasons i.e buying seedless watermelons during winter season would be futile .

  • For Social & Small Talk opportunities - Incorporating recently descr...

Search
Clear search
Close search
Google apps
Main menu