76 datasets found
  1. d

    Open Data Website Traffic

    • catalog.data.gov
    • data.lacity.org
    • +2more
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.lacity.org (2025). Open Data Website Traffic [Dataset]. https://catalog.data.gov/dataset/open-data-website-traffic
    Explore at:
    Dataset updated
    Jun 21, 2025
    Dataset provided by
    data.lacity.org
    Description

    Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly

  2. g

    Website Traffic Dataset

    • gts.ai
    json
    Updated Aug 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Website Traffic Dataset [Dataset]. https://gts.ai/dataset-download/website-traffic-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Aug 23, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore our detailed website traffic dataset featuring key metrics like page views, session duration, bounce rate, traffic source, and conversion rates.

  3. 15,000 Music Tracks - 19 Genres (w/ Spotify Data)

    • kaggle.com
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Bumpkin (2024). 15,000 Music Tracks - 19 Genres (w/ Spotify Data) [Dataset]. https://www.kaggle.com/datasets/thebumpkin/10400-classic-hits-10-genres-1923-to-2023/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Bumpkin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset is a comprehensive collection of 15,150 classic hits from 3,083 artists, spanning a century of music history from 1923 to 2023. This diverse dataset is divided into 19 distinct genres, showcasing the evolution of popular music across different eras and styles. Each track in the dataset is enriched with Spotify audio features, offering detailed insights into the acoustic properties, rhythm, tempo, and other musical characteristics. This makes the dataset not only a valuable resource for exploring trends and comparing genres but also for analyzing the sonic qualities that define classic hits across different time periods and genres.

  4. Spotify Top 50 Tracks 2023

    • kaggle.com
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yuka_with_data (2024). Spotify Top 50 Tracks 2023 [Dataset]. https://www.kaggle.com/datasets/yukawithdata/spotify-top-tracks-2023
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Kaggle
    Authors
    yuka_with_data
    Description

    💁‍♀️Please take a moment to carefully read through this description and metadata to better understand the dataset and its nuances before proceeding to the Suggestions and Discussions section.

    Dataset Description:

    This dataset compiles the tracks from Spotify's official "Top Tracks of 2023" playlist, showcasing the most popular and influential music of the year according to Spotify's streaming data. It represents a wide range array of genres, artists, and musical styles that have defined the musical landscapes of 2023. Each track in the dataset is detailed with a variety of features, popularity, and metadata. This dataset serves as an excellent resource for music enthusiasts, data analysts, and researchers aiming to explore music trends or develop music recommendation systems based on empirical data.

    Data Collection and Processing:

    Obtaining the Data:

    The data was obtained directly from the Spotify Web API, specifically from the "Top Tracks of 2023" official playlist curated by Spotify. The Spotify API provides detailed information about tracks, artists, and albums through various endpoints.

    Data Processing:

    To process and structure the data, I developed Python scripts using data science libraries such as pandas for data manipulation and spotipy for API interactions specifically for Spotify data retrieval.

    Workflow:

    1. Authentification
    2. API Requests
    3. Data Cleaning and Transformation
    4. Saving the Data

    Attribute Descriptions:

    • artist_name: the artist name
    • track_name: the title of the track
    • is_explicit: Indicates whether the track contains explicit content
    • album_release_date: The date when the track was released
    • genres: A list of genres associated with the track's artist(s)
    • danceability: A measure from 0.0 to 1.0 indicating how suitable a track is for dancing based on a combination of musical elements
    • valence: A measure from 0.0 to 1.0 indicating the musical positiveness conveyed by a track
    • energy: A measure from 0.0 to 1.0 representing a perceptual measure of intensity and activity
    • loudness: The overall loudness of a track in decibels (dB)
    • acousticness: A measure from 0.0 to 1.0 whether the track is acoustic.
    • instrumentalness: Predicts whether a track contains no vocals
    • liveness: Detects the presence of an audience in the recordings
    • speechiness: Detects the presence of spoken words in a track
    • key: The key the track is in. Integers map to pitches using standard Pitch Class notation.
    • tempo: The overall estimated tempo of a track in beats per minute (BPM)
    • mode: Modality of the track
    • duration_ms: The length of the track in milliseconds
    • time_signature: An estimated overall time signature of a track
    • popularity: A score between 0 and 100, with 100 being the most popular

    Possible Data Projects

    • Trends Analysis
    • Genre Popularity
    • Mood and Music
    • Comparison with other tracks

    Disclaimer and Responsible Use:

    • This dataset, derived from Spotify's "Top Tracks of 2023" playlist, is intended for educational, research, and analysis purposes only. Users are urged to use this data responsibly and ethically.
    • Users should comply with Spotify's Terms of Service and Developer Policies when using this dataset.
    • The dataset includes music track information such as names and artist details, which are subject to copyright. While the dataset presents this information for analytical purposes, it does not convey any rights to the music itself.
    • Users of the dataset must ensure that their use does not infringe on the rights of copyright holders. Any analysis, distribution, or derivative work should respect the intellectual property rights of all parties and comply with applicable laws.
    • The dataset is provided "as is," without warranty, and the creator disclaims any legal liability for the use of the dataset by others. Users are responsible for ensuring their use of the dataset is legal and ethical.
    • For the most accurate and up-to-date information regarding Spotify's music, playlists, and policies, users are encouraged to refer directly to Spotify's official website. This ensures that users have access to the latest details directly from the source.
    • The creator/maintainer of this dataset is not affiliated with Spotify, any third-party entities, or artists mentioned within the dataset. This project is independent and has not been authorized, sponsored, or otherwise approved by Spotify or any other mentioned entities.

    Contribution

    I encourage users who discover new insights, propose dataset enhancements, or craft analytics that illuminate aspects of the dataset's focus to share their findings with the community. - Kaggle Notebooks: To facilitate sharing and collaboration, users are encouraged to create and share their analyses through Kaggle notebooks. For ease of use, start your notebook by clicking "New Notebook" atop this dataset’s page on K...

  5. Network traffic datasets created by Single Flow Time Series Analysis

    • zenodo.org
    • explore.openaire.eu
    • +1more
    csv, pdf
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. http://doi.org/10.5281/zenodo.8035724
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network traffic datasets created by Single Flow Time Series Analysis

    Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

    J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

    This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

    In the following table is a description of each dataset file:

    File nameDetection problemCitation of original raw dataset
    botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
    botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
    cryptomining_design.csvBinary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
    cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
    dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
    doh_cic.csv Binary detection of DoH

    Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

    doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
    dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
    edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
    edge_iiot_multiclass.csvMulti-class classification of IoT malwareMohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
    https_brute_force.csvBinary detection of HTTPS Brute ForceJan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
    ids_cic_binary.csvBinary detection of intrusion in IDSIman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
    ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
    ids_unsw_nb_15_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
    ids_unsw_nb_15_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
    iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
    ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
    ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
    tor_binary.csv Binary detection of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
    tor_multiclass.csv Multi-class classification of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
    vpn_iscx_binary.csv Binary detection of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
    vpn_iscx_multiclass.csv Multi-class classification of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
    vpn_vnat_binary.csv Binary detection of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
    vpn_vnat_multiclass.csvMulti-class classification of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

  6. Traffic Flow Data Jan to June 2023 SDCC

    • data.gov.ie
    • hub.arcgis.com
    • +1more
    Updated Jul 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.ie (2023). Traffic Flow Data Jan to June 2023 SDCC [Dataset]. https://data.gov.ie/dataset/traffic-flow-data-jan-to-june-2023-sdcc1
    Explore at:
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    data.gov.ie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SDCC Traffic Congestion Saturation Flow Data for January to June 2023. Traffic volumes, traffic saturation, and congestion data for sites across South Dublin County. Used by traffic management to control stage timings on junctions. It is recommended that this dataset is read in conjunction with the ‘Traffic Data Site Names SDCC’ dataset.A detailed description of each column heading can be referenced below;scn: Site Serial numberregion: A group of Nodes that are operated under SCOOT control at the same common cycle time. Normally these will be nodes between which co-ordination is desirable. Some of the nodes may be double cycling at half of the region cycle time.system: SCOOT STC UTC (UTC-MX)locn: Locationssite: Site numbersday: Days of the week Monday to Sunday. Abbreviations; MO,TU,WE,TH,FR,SA,SU.date: Reflects correct actual Date of when data was collected.start_time: NOTE - Please ignore the date displayed in this column. The actual data collection date is correctly displayed in the 'date' column. The date displayed here is the date of when report was run and extracted from the system, but correctly reflects start time of 15 minute intervals. end_time: End time of 15 minute intervals.flow: A representation of demand (flow) for each link built up over several minutes by the SCOOT model. SCOOT has two profiles:(1) Short – Raw data representing the actual values over the previous few minutes(2) Long – A smoothed average of values over a longer periodSCOOT will choose to use the appropriate profile depending on a number of factors.flow_pc: Same as above ref PC SCOOTcong: Congestion is directly measured from the detector. If the detector is placed beyond the normal end of queue in the street it is rarely covered by stationary traffic, except of course when congestion occurs. If any detector shows standing traffic for the whole of an interval this is recorded. The number of intervals of congestion in any cycle is also recorded.The percentage congestion is calculated from:No of congested intervals x 4 x 100 cycle time in seconds.This percentage of congestion is available to view and more importantly for the optimisers to take into account.cong_pc: Same as above ref PC SCOOTdsat: The ratio of the demand flow to the maximum possible discharge flow, i.e. it is the ratio of the demand to the discharge rate (Saturation Occupancy) multiplied by the duration of the effective green time. The Split optimiser will try to minimise the maximum degree of saturation on links approaching the node.

  7. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  8. Data from: Comparison of Approaches for Determining Bioactivity Hits from...

    • catalog.data.gov
    • datasets.ai
    Updated Jul 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data [Dataset]. https://catalog.data.gov/dataset/comparison-of-approaches-for-determining-bioactivity-hits-from-high-dimensional-profiling-
    Explore at:
    Dataset updated
    Jul 20, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    ** Note to Josh Harrill- I don't have a copy of the final manuscript so could you please add the description of this dataset (just delete this comment and enter or cut and paste and then it should be ready to route by clicking on 'Submit for Review' button above) *. This dataset is associated with the following publication: Nyffeler, J., D. Haggard, C. Willis, W. Setzer, R. Judson, K. Paul-Friedman, L. Everett, and J. Harrill. Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data. SLAS Discovery. SAGE Publications, THOUSAND OAKS, CA, USA, 26(2): 292-308, (2021).

  9. Kaggle Wikipedia Web Traffic Daily Dataset (with Missing Values)

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakshitha Godahewa; Rakshitha Godahewa; Christoph Bergmeir; Christoph Bergmeir; Geoff Webb; Geoff Webb; Rob Hyndman; Rob Hyndman; Pablo Montero-Manso; Pablo Montero-Manso (2021). Kaggle Wikipedia Web Traffic Daily Dataset (with Missing Values) [Dataset]. http://doi.org/10.5281/zenodo.4656080
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rakshitha Godahewa; Rakshitha Godahewa; Christoph Bergmeir; Christoph Bergmeir; Geoff Webb; Geoff Webb; Rob Hyndman; Rob Hyndman; Pablo Montero-Manso; Pablo Montero-Manso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was used in the Kaggle Wikipedia Web Traffic forecasting competition. It contains 145063 daily time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2017-09-10.

  10. i

    Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...

    • ieee-dataport.org
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamad Amar Irsyad Mohd Aminuddin (2024). Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages [Dataset]. https://ieee-dataport.org/documents/website-fingerprinting-dataset-browsing-network-traffic-desktop-and-mobile-webpages
    Explore at:
    Dataset updated
    Oct 21, 2024
    Authors
    Mohamad Amar Irsyad Mohd Aminuddin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.

  11. f

    Summary of results comparing Google Analytics and SimilarWeb for total...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Summary of results comparing Google Analytics and SimilarWeb for total visits, unique visitors, bounce rate, and average session duration. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Difference uses Google Analytics as the Baseline. Results based on Paired t-Test for Hypotheses Supported.

  12. m

    USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven

    • app.mobito.io
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven [Dataset]. https://app.mobito.io/data-product/usa-enriched-geospatial-framework-dataset
    Explore at:
    Area covered
    United States
    Description

    Our dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).

  13. p

    Traffic Cameras - Dataset - CKAN

    • ckan0.cf.opendata.inter.prod-toronto.ca
    Updated Oct 8, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Traffic Cameras - Dataset - CKAN [Dataset]. https://ckan0.cf.opendata.inter.prod-toronto.ca/gl_ES/dataset/traffic-cameras
    Explore at:
    Dataset updated
    Oct 8, 2013
    Description

    The Traffic Camera dataset contains the location and number for every Traffic camera in the City of Toronto. These datasets will be updated within 2 minutes when cameras are added, changed, or removed. The camera list files can be found at: https://opendata.toronto.ca/transportation/tmc/rescucameraimages/Data/ tmcearthcameras.csv - CSV, camera list in CSV tmcearthcameras.json - json formatted list. tmcearthcamerassn.json - json formatted file containing the timestamp of the list files. tmcearthcameras.xml - xml formatted list. TMCEarthCameras.xsd - xml schema document. The dataset includes the number, name, WGS84 information (latitude, longitude), comparison directions (1- Looking North, 2-Looking East, 3-Looking South and 4-Looking West), and camera group. The camera images associated with the dataset can be found at: https://opendata.toronto.ca/transportation/tmc/rescucameraimages/CameraImages. And the comparison images can be found at: https://opendata.toronto.ca/transportation/tmc/rescucameraimages/ComparisonImages. The camera image file name is created as follows: loc####.jpg - where #### is the camera number. (i.e. loc1234.jpg) The camera comparison image file names are created as follows: loc####D.jpg - where #### is the camera number and D is the direction. (i.e. loc1234e.jpg and loc1234w.jpg) The camera images are displayed on the City's website at http://www.toronto.ca/rescu/index.htmor http://www.toronto.ca/rescu/list.htm

  14. d

    Air Traffic Landings Statistics

    • catalog.data.gov
    • data.sfgov.org
    • +1more
    Updated Aug 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sfgov.org (2025). Air Traffic Landings Statistics [Dataset]. https://catalog.data.gov/dataset/air-traffic-landings-statistics
    Explore at:
    Dataset updated
    Aug 23, 2025
    Dataset provided by
    data.sfgov.org
    Description

    A. SUMMARY This dataset consists of San Francisco International Airport (SFO) The aircraft landing dataset contains data about aircraft landings at SFO with monthly landing counts and landed weight by airline, region and aircraft model and type. B. HOW THE DATASET IS CREATED Data is self-reported by airlines and is only available at a monthly level. C. UPDATE PROCESS Data is available starting in July 1999 and will be updated monthly. D. HOW TO USE THIS DATASET Airport data is seasonal in nature; therefore, any comparative analyses should be done on a period-over-period basis (i.e. January 2010 vs. January 2009) as opposed to period-to-period (i.e. January 2010 vs. February 2010). It is also important to note that fact and attribute field relationships are not always 1-to-1. For example, Cargo Statistics belonging to United Airlines will appear in multiple attribute fields and are additive, which provides flexibility for the user to derive categorical Cargo Statistics as desired. E. RELATED DATASETS A summary of monthly comparative air-traffic statistics is also available on SFO’s internet site at https://www.flysfo.com/about/media/facts-statistics/air-traffic-statistics

  15. A

    ‘K-Pop Hits Through The Years’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘K-Pop Hits Through The Years’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-k-pop-hits-through-the-years-0b70/be8b4573/?iid=032-298&v=presentation
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘K-Pop Hits Through The Years’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sberj127/kpop-hits-through-the-years on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    What is the data?

    The datasets contain the top songs from the said era or year accordingly (as presented in the name of each dataset). Note that only the KPopHits90s dataset represents an era (1989-2001). Although there is a lack of easily available and reliable sources to show the actual K-Pop hits per year during the 90s, this era was still included as this time period was when the first generation of K-Pop stars appeared. Each of the other datasets represent a specific year after the 90s.

    How was it obtained?

    A song is considered to be a K-Pop hit during that era or year if it is included in the annual series of K-Pop Hits playlists, which is created officially by Apple Music. Note that for the dataset that represents the 90s, the playlist 90s K-Pop Essentials was used as the reference.

    1. These playlists were transferred into Spotify through the Tune My Music site. After transferring, the site also presented all the missing songs from each Spotify playlist when compared to the original Apple Music playlists.
      • Any data besides the names and artists of the hit songs were not directly obtained from Apple Music since these other details of songs in this music service are only available for those enrolled as members of the Apple Developer Program.
    2. The presented missing songs from each playlist was manually searched and, if found, added to the respective Spotify playlist.
      • For the songs that were found, there are three types: (1) the song by the original artist, (2) the instrumental of the original song and (3) a cover of the song. When the first type is not found, the two other types are searched and are compared to each other. The one that sounded the most like the original song (from the Apple Music playlist) is chosen as the substitute in the Spotify playlist.
      • Presented is a link containing all the missing data per playlist (when the initial Spotify playlists were compared to the original Apple Music playlists) and the action done to each one.
    3. The necessary identification details and specific audio features of each track were obtained through the use of the Spotipy library and Spotify Web API documentation.

    Why did you make this?

    As someone who has a particular curiosity to the field of data science and a genuine love for the musicality in the K-Pop scene, this data set was created to make something out of the strong interest I have for these separate subjects.

    Acknowledgements

    I would like to express my sincere gratitude to Apple Music for creating the annual K-Pop playlists, Spotify for making their API very accessible, Spotipy for making it easier to get the desired data from the Spotify Web API, Tune My Music for automating the process of transferring one's library into another service's library and, of course, all those involved in the making of these songs and artists included in these datasets for creating such high quality music and concepts digestible even for the general public.

    --- Original source retains full ownership of the source dataset ---

  16. s

    Traffic Flow Data Jan to June 2020 SDCC - Dataset - data.smartdublin.ie

    • data.smartdublin.ie
    Updated Feb 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Traffic Flow Data Jan to June 2020 SDCC - Dataset - data.smartdublin.ie [Dataset]. https://data.smartdublin.ie/dataset/traffic-flow-data-jan-to-june-2020-sdcc1
    Explore at:
    Dataset updated
    Feb 16, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SDCC Traffic Congestion Saturation Flow Data for January to June 2020. Traffic volumes, traffic saturation, and congestion data for sites across South Dublin County. Used by traffic management to control stage timings on junctions. It is recommended that this dataset is read in conjunction with the ‘Traffic Data Site Names SDCC’ dataset.A detailed description of each column heading can be referenced below;scn: Site Serial numberregion: A group of Nodes that are operated under SCOOT control at the same common cycle time. Normally these will be nodes between which co-ordination is desirable. Some of the nodes may be double cycling at half of the region cycle time.system: SCOOT STC UTC (UTC-MX)locn: Locationssite: Site numbersday: Days of the week Monday to Sunday. Abbreviations; MO,TU,WE,TH,FR,SA,SU.date: Reflects correct actual Date of when data was collected.start_time: NOTE - Please ignore the date displayed in this column. The actual data collection date is correctly displayed in the 'date' column. The date displayed here is the date of when report was run and extracted from the system, but correctly reflects start time of 15 minute intervals. end_time: End time of 15 minute intervals.flow: A representation of demand (flow) for each link built up over several minutes by the SCOOT model. SCOOT has two profiles:(1) Short – Raw data representing the actual values over the previous few minutes(2) Long – A smoothed average of values over a longer periodSCOOT will choose to use the appropriate profile depending on a number of factors.flow_pc: Same as above ref PC SCOOTcong: Congestion is directly measured from the detector. If the detector is placed beyond the normal end of queue in the street it is rarely covered by stationary traffic, except of course when congestion occurs. If any detector shows standing traffic for the whole of an interval this is recorded. The number of intervals of congestion in any cycle is also recorded.The percentage congestion is calculated from:No of congested intervals x 4 x 100 cycle time in seconds.This percentage of congestion is available to view and more importantly for the optimisers to take into account.cong_pc: Same as above ref PC SCOOTdsat: The ratio of the demand flow to the maximum possible discharge flow, i.e. it is the ratio of the demand to the discharge rate (Saturation Occupancy) multiplied by the duration of the effective green time. The Split optimiser will try to minimise the maximum degree of saturation on links approaching the node.

  17. Data from: Analysis of the Quantitative Impact of Social Networks General...

    • figshare.com
    • produccioncientifica.ucm.es
    doc
    Updated Oct 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Parra; Santiago Martínez Arias; Sergio Mena Muñoz (2022). Analysis of the Quantitative Impact of Social Networks General Data.doc [Dataset]. http://doi.org/10.6084/m9.figshare.21329421.v1
    Explore at:
    docAvailable download formats
    Dataset updated
    Oct 14, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    David Parra; Santiago Martínez Arias; Sergio Mena Muñoz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union". Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content? To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic. In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
    Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained. To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market. It includes:

    Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures

  18. f

    Comparison of the Predictive Performance and Interpretability of Random...

    • acs.figshare.com
    • figshare.com
    zip
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard L. Marchese Robinson; Anna Palczewska; Jan Palczewski; Nathan Kidley (2023). Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets [Dataset]. http://doi.org/10.1021/acs.jcim.6b00753.s006
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    ACS Publications
    Authors
    Richard L. Marchese Robinson; Anna Palczewska; Jan Palczewski; Nathan Kidley
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The ability to interpret the predictions made by quantitative structure–activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package (https://r-forge.r-project.org/R/?group_id=1725) for the R statistical programming language and the Python program HeatMapWrapper [https://doi.org/10.5281/zenodo.495163] for heat map generation.

  19. g

    A comprehensive dataset of website traffic

    • gimi9.com
    • data.europa.eu
    Updated Jul 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). A comprehensive dataset of website traffic [Dataset]. https://gimi9.com/dataset/eu_https-open-bydata-de-api-hub-repo-datasets-https-mediatum-ub-tum-de-1700647-dataset/
    Explore at:
    Dataset updated
    Jul 14, 2024
    Description

    The dataset contains traffic collected for 96 websites located in

  20. i

    NAT Network Traffic Dataset

    • ieee-dataport.org
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sameh Farhat (2020). NAT Network Traffic Dataset [Dataset]. https://ieee-dataport.org/documents/nat-network-traffic-dataset
    Explore at:
    Dataset updated
    Sep 17, 2020
    Authors
    Sameh Farhat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network Address Translation (NAT)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
data.lacity.org (2025). Open Data Website Traffic [Dataset]. https://catalog.data.gov/dataset/open-data-website-traffic

Open Data Website Traffic

Explore at:
Dataset updated
Jun 21, 2025
Dataset provided by
data.lacity.org
Description

Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly

Search
Clear search
Close search
Google apps
Main menu