76 datasets found

d
Open Data Website Traffic
catalog.data.gov
data.lacity.org
+2more
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.lacity.org (2025). Open Data Website Traffic [Dataset]. https://catalog.data.gov/dataset/open-data-website-traffic
Explore at:
Dataset updated
Jun 21, 2025
Dataset provided by
data.lacity.org
Description
Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly
g
Website Traffic Dataset
gts.ai
json
Updated Aug 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Website Traffic Dataset [Dataset]. https://gts.ai/dataset-download/website-traffic-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Aug 23, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore our detailed website traffic dataset featuring key metrics like page views, session duration, bounce rate, traffic source, and conversion rates.
15,000 Music Tracks - 19 Genres (w/ Spotify Data)
kaggle.com
Updated Aug 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Bumpkin (2024). 15,000 Music Tracks - 19 Genres (w/ Spotify Data) [Dataset]. https://www.kaggle.com/datasets/thebumpkin/10400-classic-hits-10-genres-1923-to-2023/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 30, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Bumpkin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset is a comprehensive collection of 15,150 classic hits from 3,083 artists, spanning a century of music history from 1923 to 2023. This diverse dataset is divided into 19 distinct genres, showcasing the evolution of popular music across different eras and styles. Each track in the dataset is enriched with Spotify audio features, offering detailed insights into the acoustic properties, rhythm, tempo, and other musical characteristics. This makes the dataset not only a valuable resource for exploring trends and comparing genres but also for analyzing the sonic qualities that define classic hits across different time periods and genres.
Spotify Top 50 Tracks 2023
kaggle.com
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yuka_with_data (2024). Spotify Top 50 Tracks 2023 [Dataset]. https://www.kaggle.com/datasets/yukawithdata/spotify-top-tracks-2023
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2024
Dataset provided by
Kaggle
Authors
yuka_with_data
Description
💁‍♀️Please take a moment to carefully read through this description and metadata to better understand the dataset and its nuances before proceeding to the Suggestions and Discussions section.

Dataset Description:

This dataset compiles the tracks from Spotify's official "Top Tracks of 2023" playlist, showcasing the most popular and influential music of the year according to Spotify's streaming data. It represents a wide range array of genres, artists, and musical styles that have defined the musical landscapes of 2023. Each track in the dataset is detailed with a variety of features, popularity, and metadata. This dataset serves as an excellent resource for music enthusiasts, data analysts, and researchers aiming to explore music trends or develop music recommendation systems based on empirical data.

Data Collection and Processing:

Obtaining the Data:

The data was obtained directly from the Spotify Web API, specifically from the "Top Tracks of 2023" official playlist curated by Spotify. The Spotify API provides detailed information about tracks, artists, and albums through various endpoints.

Data Processing:

To process and structure the data, I developed Python scripts using data science libraries such as pandas for data manipulation and spotipy for API interactions specifically for Spotify data retrieval.

Workflow:

Authentification

API Requests

Data Cleaning and Transformation

Saving the Data

Attribute Descriptions:

artist_name: the artist name

track_name: the title of the track

is_explicit: Indicates whether the track contains explicit content

album_release_date: The date when the track was released

genres: A list of genres associated with the track's artist(s)

danceability: A measure from 0.0 to 1.0 indicating how suitable a track is for dancing based on a combination of musical elements

valence: A measure from 0.0 to 1.0 indicating the musical positiveness conveyed by a track

energy: A measure from 0.0 to 1.0 representing a perceptual measure of intensity and activity

loudness: The overall loudness of a track in decibels (dB)

acousticness: A measure from 0.0 to 1.0 whether the track is acoustic.

instrumentalness: Predicts whether a track contains no vocals

liveness: Detects the presence of an audience in the recordings

speechiness: Detects the presence of spoken words in a track

key: The key the track is in. Integers map to pitches using standard Pitch Class notation.

tempo: The overall estimated tempo of a track in beats per minute (BPM)

mode: Modality of the track

duration_ms: The length of the track in milliseconds

time_signature: An estimated overall time signature of a track

popularity: A score between 0 and 100, with 100 being the most popular

Possible Data Projects

Trends Analysis

Genre Popularity

Mood and Music

Comparison with other tracks

Disclaimer and Responsible Use:

This dataset, derived from Spotify's "Top Tracks of 2023" playlist, is intended for educational, research, and analysis purposes only. Users are urged to use this data responsibly and ethically.

Users should comply with Spotify's Terms of Service and Developer Policies when using this dataset.

The dataset includes music track information such as names and artist details, which are subject to copyright. While the dataset presents this information for analytical purposes, it does not convey any rights to the music itself.

Users of the dataset must ensure that their use does not infringe on the rights of copyright holders. Any analysis, distribution, or derivative work should respect the intellectual property rights of all parties and comply with applicable laws.

The dataset is provided "as is," without warranty, and the creator disclaims any legal liability for the use of the dataset by others. Users are responsible for ensuring their use of the dataset is legal and ethical.

For the most accurate and up-to-date information regarding Spotify's music, playlists, and policies, users are encouraged to refer directly to Spotify's official website. This ensures that users have access to the latest details directly from the source.

The creator/maintainer of this dataset is not affiliated with Spotify, any third-party entities, or artists mentioned within the dataset. This project is independent and has not been authorized, sponsored, or otherwise approved by Spotify or any other mentioned entities.

Contribution

I encourage users who discover new insights, propose dataset enhancements, or craft analytics that illuminate aspects of the dataset's focus to share their findings with the community. - Kaggle Notebooks: To facilitate sharing and collaboration, users are encouraged to create and share their analyses through Kaggle notebooks. For ease of use, start your notebook by clicking "New Notebook" atop this dataset’s page on K...

Network traffic datasets created by Single Flow Time Series Analysis

zenodo.org
explore.openaire.eu
+1more

csv, pdf

Updated Jul 11, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. http://doi.org/10.5281/zenodo.8035724

Explore at:

csv, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8035724

Dataset updated

Jul 11, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Network traffic datasets created by Single Flow Time Series Analysis

Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

In the following table is a description of each dataset file:

File name	Detection problem	Citation of original raw dataset
botnet_binary.csv	Binary detection of botnet	S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
botnet_multiclass.csv	Multi-class classification of botnet	S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
cryptomining_design.csv	Binary detection of cryptomining; the design part	Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
cryptomining_evaluation.csv	Binary detection of cryptomining; the evaluation part	Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
dns_malware.csv	Binary detection of malware DNS	Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
doh_cic.csv	Binary detection of DoH	Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020
doh_real_world.csv	Binary detection of DoH	Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
dos.csv	Binary detection of DoS	Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
edge_iiot_binary.csv	Binary detection of IoT malware	Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
edge_iiot_multiclass.csv	Multi-class classification of IoT malware	Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
https_brute_force.csv	Binary detection of HTTPS Brute Force	Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
ids_cic_binary.csv	Binary detection of intrusion in IDS	Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_cic_multiclass.csv	Multi-class classification of intrusion in IDS	Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_unsw_nb_15_binary.csv	Binary detection of intrusion in IDS	Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
ids_unsw_nb_15_multiclass.csv	Multi-class classification of intrusion in IDS	Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
iot_23.csv	Binary detection of IoT malware	Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
ton_iot_binary.csv	Binary detection of IoT malware	Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
ton_iot_multiclass.csv	Multi-class classification of IoT malware	Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
tor_binary.csv	Binary detection of TOR	Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
tor_multiclass.csv	Multi-class classification of TOR	Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
vpn_iscx_binary.csv	Binary detection of VPN	Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_iscx_multiclass.csv	Multi-class classification of VPN	Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_vnat_binary.csv	Binary detection of VPN	Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
vpn_vnat_multiclass.csv	Multi-class classification of VPN	Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

Traffic Flow Data Jan to June 2023 SDCC
data.gov.ie
hub.arcgis.com
+1more
Updated Jul 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.ie (2023). Traffic Flow Data Jan to June 2023 SDCC [Dataset]. https://data.gov.ie/dataset/traffic-flow-data-jan-to-june-2023-sdcc1
Explore at:
Dataset updated
Jul 1, 2023
Dataset provided by
data.gov.ie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SDCC Traffic Congestion Saturation Flow Data for January to June 2023. Traffic volumes, traffic saturation, and congestion data for sites across South Dublin County. Used by traffic management to control stage timings on junctions. It is recommended that this dataset is read in conjunction with the ‘Traffic Data Site Names SDCC’ dataset.A detailed description of each column heading can be referenced below;scn: Site Serial numberregion: A group of Nodes that are operated under SCOOT control at the same common cycle time. Normally these will be nodes between which co-ordination is desirable. Some of the nodes may be double cycling at half of the region cycle time.system: SCOOT STC UTC (UTC-MX)locn: Locationssite: Site numbersday: Days of the week Monday to Sunday. Abbreviations; MO,TU,WE,TH,FR,SA,SU.date: Reflects correct actual Date of when data was collected.start_time: NOTE - Please ignore the date displayed in this column. The actual data collection date is correctly displayed in the 'date' column. The date displayed here is the date of when report was run and extracted from the system, but correctly reflects start time of 15 minute intervals. end_time: End time of 15 minute intervals.flow: A representation of demand (flow) for each link built up over several minutes by the SCOOT model. SCOOT has two profiles:(1) Short – Raw data representing the actual values over the previous few minutes(2) Long – A smoothed average of values over a longer periodSCOOT will choose to use the appropriate profile depending on a number of factors.flow_pc: Same as above ref PC SCOOTcong: Congestion is directly measured from the detector. If the detector is placed beyond the normal end of queue in the street it is rarely covered by stationary traffic, except of course when congestion occurs. If any detector shows standing traffic for the whole of an interval this is recorded. The number of intervals of congestion in any cycle is also recorded.The percentage congestion is calculated from:No of congested intervals x 4 x 100 cycle time in seconds.This percentage of congestion is available to view and more importantly for the optimisers to take into account.cong_pc: Same as above ref PC SCOOTdsat: The ratio of the demand flow to the maximum possible discharge flow, i.e. it is the ratio of the demand to the discharge rate (Saturation Occupancy) multiplied by the duration of the effective green time. The Split optimiser will try to minimise the maximum degree of saturation on links approaching the node.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
Data from: Comparison of Approaches for Determining Bioactivity Hits from...
catalog.data.gov
datasets.ai
Updated Jul 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data [Dataset]. https://catalog.data.gov/dataset/comparison-of-approaches-for-determining-bioactivity-hits-from-high-dimensional-profiling-
Explore at:
Dataset updated
Jul 20, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
** Note to Josh Harrill- I don't have a copy of the final manuscript so could you please add the description of this dataset (just delete this comment and enter or cut and paste and then it should be ready to route by clicking on 'Submit for Review' button above) *. This dataset is associated with the following publication: Nyffeler, J., D. Haggard, C. Willis, W. Setzer, R. Judson, K. Paul-Friedman, L. Everett, and J. Harrill. Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data. SLAS Discovery. SAGE Publications, THOUSAND OAKS, CA, USA, 26(2): 292-308, (2021).
Kaggle Wikipedia Web Traffic Daily Dataset (with Missing Values)
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rakshitha Godahewa; Rakshitha Godahewa; Christoph Bergmeir; Christoph Bergmeir; Geoff Webb; Geoff Webb; Rob Hyndman; Rob Hyndman; Pablo Montero-Manso; Pablo Montero-Manso (2021). Kaggle Wikipedia Web Traffic Daily Dataset (with Missing Values) [Dataset]. http://doi.org/10.5281/zenodo.4656080
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4656080
Dataset updated
Apr 1, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rakshitha Godahewa; Rakshitha Godahewa; Christoph Bergmeir; Christoph Bergmeir; Geoff Webb; Geoff Webb; Rob Hyndman; Rob Hyndman; Pablo Montero-Manso; Pablo Montero-Manso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was used in the Kaggle Wikipedia Web Traffic forecasting competition. It contains 145063 daily time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2017-09-10.
i
Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...
ieee-dataport.org
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Amar Irsyad Mohd Aminuddin (2024). Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages [Dataset]. https://ieee-dataport.org/documents/website-fingerprinting-dataset-browsing-network-traffic-desktop-and-mobile-webpages
Explore at:
Dataset updated
Oct 21, 2024
Authors
Mohamad Amar Irsyad Mohd Aminuddin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.
f
Summary of results comparing Google Analytics and SimilarWeb for total...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Summary of results comparing Google Analytics and SimilarWeb for total visits, unique visitors, bounce rate, and average session duration. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0268212.t006
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Difference uses Google Analytics as the Baseline. Results based on Paired t-Test for Hypotheses Supported.
m
USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven
app.mobito.io
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven [Dataset]. https://app.mobito.io/data-product/usa-enriched-geospatial-framework-dataset
Explore at:
Area covered
United States
Description
Our dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).
p
Traffic Cameras - Dataset - CKAN
ckan0.cf.opendata.inter.prod-toronto.ca
Updated Oct 8, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Traffic Cameras - Dataset - CKAN [Dataset]. https://ckan0.cf.opendata.inter.prod-toronto.ca/gl_ES/dataset/traffic-cameras
Explore at:
Dataset updated
Oct 8, 2013
Description
The Traffic Camera dataset contains the location and number for every Traffic camera in the City of Toronto. These datasets will be updated within 2 minutes when cameras are added, changed, or removed. The camera list files can be found at: https://opendata.toronto.ca/transportation/tmc/rescucameraimages/Data/ tmcearthcameras.csv - CSV, camera list in CSV tmcearthcameras.json - json formatted list. tmcearthcamerassn.json - json formatted file containing the timestamp of the list files. tmcearthcameras.xml - xml formatted list. TMCEarthCameras.xsd - xml schema document. The dataset includes the number, name, WGS84 information (latitude, longitude), comparison directions (1- Looking North, 2-Looking East, 3-Looking South and 4-Looking West), and camera group. The camera images associated with the dataset can be found at: https://opendata.toronto.ca/transportation/tmc/rescucameraimages/CameraImages. And the comparison images can be found at: https://opendata.toronto.ca/transportation/tmc/rescucameraimages/ComparisonImages. The camera image file name is created as follows: loc####.jpg - where #### is the camera number. (i.e. loc1234.jpg) The camera comparison image file names are created as follows: loc####D.jpg - where #### is the camera number and D is the direction. (i.e. loc1234e.jpg and loc1234w.jpg) The camera images are displayed on the City's website at http://www.toronto.ca/rescu/index.htmor http://www.toronto.ca/rescu/list.htm
d
Air Traffic Landings Statistics
catalog.data.gov
data.sfgov.org
+1more
Updated Aug 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.sfgov.org (2025). Air Traffic Landings Statistics [Dataset]. https://catalog.data.gov/dataset/air-traffic-landings-statistics
Explore at:
Dataset updated
Aug 23, 2025
Dataset provided by
data.sfgov.org
Description
A. SUMMARY This dataset consists of San Francisco International Airport (SFO) The aircraft landing dataset contains data about aircraft landings at SFO with monthly landing counts and landed weight by airline, region and aircraft model and type. B. HOW THE DATASET IS CREATED Data is self-reported by airlines and is only available at a monthly level. C. UPDATE PROCESS Data is available starting in July 1999 and will be updated monthly. D. HOW TO USE THIS DATASET Airport data is seasonal in nature; therefore, any comparative analyses should be done on a period-over-period basis (i.e. January 2010 vs. January 2009) as opposed to period-to-period (i.e. January 2010 vs. February 2010). It is also important to note that fact and attribute field relationships are not always 1-to-1. For example, Cargo Statistics belonging to United Airlines will appear in multiple attribute fields and are additive, which provides flexibility for the user to derive categorical Cargo Statistics as desired. E. RELATED DATASETS A summary of monthly comparative air-traffic statistics is also available on SFO’s internet site at https://www.flysfo.com/about/media/facts-statistics/air-traffic-statistics
A
‘K-Pop Hits Through The Years’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘K-Pop Hits Through The Years’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-k-pop-hits-through-the-years-0b70/be8b4573/?iid=032-298&v=presentation
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘K-Pop Hits Through The Years’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sberj127/kpop-hits-through-the-years on 12 November 2021.

--- Dataset description provided by original source is as follows ---

What is the data?

The datasets contain the top songs from the said era or year accordingly (as presented in the name of each dataset). Note that only the KPopHits90s dataset represents an era (1989-2001). Although there is a lack of easily available and reliable sources to show the actual K-Pop hits per year during the 90s, this era was still included as this time period was when the first generation of K-Pop stars appeared. Each of the other datasets represent a specific year after the 90s.

How was it obtained?

A song is considered to be a K-Pop hit during that era or year if it is included in the annual series of K-Pop Hits playlists, which is created officially by Apple Music. Note that for the dataset that represents the 90s, the playlist 90s K-Pop Essentials was used as the reference.

These playlists were transferred into Spotify through the Tune My Music site. After transferring, the site also presented all the missing songs from each Spotify playlist when compared to the original Apple Music playlists.

Any data besides the names and artists of the hit songs were not directly obtained from Apple Music since these other details of songs in this music service are only available for those enrolled as members of the Apple Developer Program.

The presented missing songs from each playlist was manually searched and, if found, added to the respective Spotify playlist.

For the songs that were found, there are three types: (1) the song by the original artist, (2) the instrumental of the original song and (3) a cover of the song. When the first type is not found, the two other types are searched and are compared to each other. The one that sounded the most like the original song (from the Apple Music playlist) is chosen as the substitute in the Spotify playlist.

Presented is a link containing all the missing data per playlist (when the initial Spotify playlists were compared to the original Apple Music playlists) and the action done to each one.

The necessary identification details and specific audio features of each track were obtained through the use of the Spotipy library and Spotify Web API documentation.

Why did you make this?

As someone who has a particular curiosity to the field of data science and a genuine love for the musicality in the K-Pop scene, this data set was created to make something out of the strong interest I have for these separate subjects.

Acknowledgements

I would like to express my sincere gratitude to Apple Music for creating the annual K-Pop playlists, Spotify for making their API very accessible, Spotipy for making it easier to get the desired data from the Spotify Web API, Tune My Music for automating the process of transferring one's library into another service's library and, of course, all those involved in the making of these songs and artists included in these datasets for creating such high quality music and concepts digestible even for the general public.

--- Original source retains full ownership of the source dataset ---
s
Traffic Flow Data Jan to June 2020 SDCC - Dataset - data.smartdublin.ie
data.smartdublin.ie
Updated Feb 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Traffic Flow Data Jan to June 2020 SDCC - Dataset - data.smartdublin.ie [Dataset]. https://data.smartdublin.ie/dataset/traffic-flow-data-jan-to-june-2020-sdcc1
Explore at:
Dataset updated
Feb 16, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SDCC Traffic Congestion Saturation Flow Data for January to June 2020. Traffic volumes, traffic saturation, and congestion data for sites across South Dublin County. Used by traffic management to control stage timings on junctions. It is recommended that this dataset is read in conjunction with the ‘Traffic Data Site Names SDCC’ dataset.A detailed description of each column heading can be referenced below;scn: Site Serial numberregion: A group of Nodes that are operated under SCOOT control at the same common cycle time. Normally these will be nodes between which co-ordination is desirable. Some of the nodes may be double cycling at half of the region cycle time.system: SCOOT STC UTC (UTC-MX)locn: Locationssite: Site numbersday: Days of the week Monday to Sunday. Abbreviations; MO,TU,WE,TH,FR,SA,SU.date: Reflects correct actual Date of when data was collected.start_time: NOTE - Please ignore the date displayed in this column. The actual data collection date is correctly displayed in the 'date' column. The date displayed here is the date of when report was run and extracted from the system, but correctly reflects start time of 15 minute intervals. end_time: End time of 15 minute intervals.flow: A representation of demand (flow) for each link built up over several minutes by the SCOOT model. SCOOT has two profiles:(1) Short – Raw data representing the actual values over the previous few minutes(2) Long – A smoothed average of values over a longer periodSCOOT will choose to use the appropriate profile depending on a number of factors.flow_pc: Same as above ref PC SCOOTcong: Congestion is directly measured from the detector. If the detector is placed beyond the normal end of queue in the street it is rarely covered by stationary traffic, except of course when congestion occurs. If any detector shows standing traffic for the whole of an interval this is recorded. The number of intervals of congestion in any cycle is also recorded.The percentage congestion is calculated from:No of congested intervals x 4 x 100 cycle time in seconds.This percentage of congestion is available to view and more importantly for the optimisers to take into account.cong_pc: Same as above ref PC SCOOTdsat: The ratio of the demand flow to the maximum possible discharge flow, i.e. it is the ratio of the demand to the discharge rate (Saturation Occupancy) multiplied by the duration of the effective green time. The Split optimiser will try to minimise the maximum degree of saturation on links approaching the node.
Data from: Analysis of the Quantitative Impact of Social Networks General...
figshare.com
produccioncientifica.ucm.es
doc
Updated Oct 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Parra; Santiago Martínez Arias; Sergio Mena Muñoz (2022). Analysis of the Quantitative Impact of Social Networks General Data.doc [Dataset]. http://doi.org/10.6084/m9.figshare.21329421.v1
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21329421.v1
Dataset updated
Oct 14, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
David Parra; Santiago Martínez Arias; Sergio Mena Muñoz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union". Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content? To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic. In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained. To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market. It includes:

Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures
f
Comparison of the Predictive Performance and Interpretability of Random...
acs.figshare.com
figshare.com
zip
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard L. Marchese Robinson; Anna Palczewska; Jan Palczewski; Nathan Kidley (2023). Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets [Dataset]. http://doi.org/10.1021/acs.jcim.6b00753.s006
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.6b00753.s006
Dataset updated
Jun 5, 2023
Dataset provided by
ACS Publications
Authors
Richard L. Marchese Robinson; Anna Palczewska; Jan Palczewski; Nathan Kidley
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The ability to interpret the predictions made by quantitative structure–activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package (https://r-forge.r-project.org/R/?group_id=1725) for the R statistical programming language and the Python program HeatMapWrapper [https://doi.org/10.5281/zenodo.495163] for heat map generation.
g
A comprehensive dataset of website traffic
gimi9.com
data.europa.eu
Updated Jul 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). A comprehensive dataset of website traffic [Dataset]. https://gimi9.com/dataset/eu_https-open-bydata-de-api-hub-repo-datasets-https-mediatum-ub-tum-de-1700647-dataset/
Explore at:
Dataset updated
Jul 14, 2024
Description
The dataset contains traffic collected for 96 websites located in
i
NAT Network Traffic Dataset
ieee-dataport.org
Updated Sep 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sameh Farhat (2020). NAT Network Traffic Dataset [Dataset]. https://ieee-dataport.org/documents/nat-network-traffic-dataset
Explore at:
Dataset updated
Sep 17, 2020
Authors
Sameh Farhat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Network Address Translation (NAT)

Facebook

Twitter

Click to copy link

Link copied

Cite

data.lacity.org (2025). Open Data Website Traffic [Dataset]. https://catalog.data.gov/dataset/open-data-website-traffic

Open Data Website Traffic

Explore at:

Dataset updated

Jun 21, 2025

Dataset provided by

data.lacity.org

Description

Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly

Clear search

Close search

Google apps

Main menu

Open Data Website Traffic

Website Traffic Dataset

15,000 Music Tracks - 19 Genres (w/ Spotify Data)

Spotify Top 50 Tracks 2023

Dataset Description:

Data Collection and Processing:

Obtaining the Data:

Data Processing:

Workflow:

Attribute Descriptions:

Possible Data Projects

Disclaimer and Responsible Use:

Contribution

Network traffic datasets created by Single Flow Time Series Analysis

Traffic Flow Data Jan to June 2023 SDCC

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Data from: Comparison of Approaches for Determining Bioactivity Hits from...

Kaggle Wikipedia Web Traffic Daily Dataset (with Missing Values)

Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...

Summary of results comparing Google Analytics and SimilarWeb for total...

USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven

Traffic Cameras - Dataset - CKAN

Air Traffic Landings Statistics

‘K-Pop Hits Through The Years’ analyzed by Analyst-2

What is the data?

How was it obtained?

Why did you make this?

Acknowledgements

Traffic Flow Data Jan to June 2020 SDCC - Dataset - data.smartdublin.ie

Data from: Analysis of the Quantitative Impact of Social Networks General...

Comparison of the Predictive Performance and Interpretability of Random...

A comprehensive dataset of website traffic

NAT Network Traffic Dataset

Open Data Website Traffic