Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Background
Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.
Methodology
The data collected originates from SimilarWeb.com.
Source
For the analysis and study, go to The Concept Center
This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.
- Analyze 11/1/2016 in relation to 2/1/2017
- Study the influence of 4/1/2017 on 1/1/2017
- More datasets
If you use this dataset in your research, please credit Chase Willden
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Website Analytics’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/ecee4df3-8149-4b74-8927-428ea920b758 on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
On a quest to compare different cryptoexchanges, I came up with the idea to compare metrics across multiple platforms (at the moment just two). CoinGecko and CoinMarketCap are two of the biggest websites for monitoring both exchanges and cryptoprojects. In response to over-inflated volumes faked by crypto exchanges, both websites came up with independent metrics for assessing the worth of a given exchange.
Collected on May 10, 2020
CoinGecko's data is a bit more holistic, containing metrics across a multitude of areas (you can read more in the original blog post here. The data from CoinGecko consists of the following:
-Exchange Name -Trust Score (on a scale of N/A-10) -Type (centralized/decentralized) -AML (risk: How well prepared are they to handle financial crime?) -API Coverage (Blanket Measure that includes: (1) Tickers Data (2) Historical Trades Data (3) Order Book Data (4) Candlestick/OHLC (5) WebSocket API (6) API Trading (7) Public Documentation -API Last Updated (When was the API last updated?) -Bid Ask Spread (Average buy/sell spread across all pairs) -Candlestick (Available/Not) -Combined Orderbook Percentile (See above link) -Estimated_Reserves (estimated holdings of major crypto) -Grade_Score (Overall API score) -Historical Data (available/not) -Jurisdiction Risk (risk: risk of Terrorist activity/bribery/corruption?) -KYC Procedures (risk: Know Your Customer?) -License and Authorization (risk: has exchange sought regulatory approval?) -Liquidity (don't confuse with "CMC Liquidity". THIS column is a combo of (1) Web traffic & Reported Volume (2) Order book spread (3) Trading Activity (4) Trust Score on Trading Pairs -Negative News (risk: any bad news?) -Normalized Trading Volume (Trading Volume normalized to web traffic) -Normalized Volume Percentile (see above blog link) -Orderbook (available/not) -Public Documentation (got well documented API available to everyone?) -Regulatory Compliance (risk rating from compliance perspective) -Regulatory last updated (last time regulatory metrics were updated) -Reported Trading Volume (volume as listed by the exchange) -Reported Normalized Trading Volume (Ratio of normalized to reported volume [0-1]) -Sanctions (risk: risk of sanctions?) -Scale (based on: (1) Normalized Trading Volume Percentile (2) Normalized Order Book Depth Percentile -Senior Public Figure (risk: does exchange have transparent public relations? etc) -Tickers (tick tick tick...) -Trading via API (can data be traded through the API?) -Websocket (got websockets?)
-Green Pairs (Percentage of trading pairs deemed to have good liquidity) -Yellow Pairs (Percentage of trading pairs deemed to have fair liquidity -Red Pairs (Percentage of trading pairs deemed to have poor liquidity) -Unknown Pairs (percentage of trading pairs that do not have sufficient order book data)
~
Again, CoinMarketCap only has one metric (that was recently updated and scales from 1-1000, 1000 being very liquid and 1 not. You can go check the article out for yourself. In the dataset, this is the "CMC Liquidity" column, not to be confused with the "Liquidity" column, which refers to the CoinGecko Metric!
Thanks to coingecko and cmc for making their data scrapable :)
[CMC, you should try to give us a little more access to the figures that define your metric. Thanks!]
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
The "Phishing Data" dataset is a comprehensive collection of information specifically curated for analyzing and understanding phishing attacks. Phishing attacks involve malicious attempts to deceive individuals or organizations into disclosing sensitive information such as passwords or credit card details. This dataset comprises 18 distinct features that offer valuable insights into the characteristics of phishing attempts. These features include the URL of the website being analyzed, the length of the URL, the use of URL shortening services, the presence of the "@" symbol, the presence of redirection using "//", the presence of prefixes or suffixes in the URL, the number of subdomains, the usage of secure connection protocols (HTTPS), the length of time since domain registration, the presence of a favicon, the presence of HTTP or HTTPS tokens in the domain name, the URL of requested external resources, the presence of anchors in the URL, the number of hyperlinks in HTML tags, the server form handler used, the submission of data to email addresses, abnormal URL patterns, and estimated website traffic or popularity. Together, these features enable the analysis and detection of phishing attempts in the "Phishing Data" dataset, aiding in the development of models and algorithms to combat phishing attacks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The DNS over HTTPS (DoH) is becoming a default option for domain resolution in modern privacy-aware software. Therefore, research has already focused on various aspects; however, a comprehensive dataset from an actual production network is still missing. In this paper, we present a novel dataset, which comprises multiple PCAP files of DoH traffic. The captured traffic is generated towards various DoH providers to cover differences of various DoH server implementations and configurations. In addition to generated traffic, we also provide real network traffic captured on high-speed backbone lines of a large Internet Service Provider with around half a million users. Network identifiers (excluding network identifiers of DoH resolvers) in the real network traffic (e.g., IP addresses and transmitted content) were anonymized, but still, the important characteristics of the traffic can still be obtained from the data that can be used, e.g., for network traffic classification research. The real network traffic dataset contains DoH and also non-DoH HTTPS traffic as observed at the collection points in the network.
This repository provides supplementary files for the "Collection of Datasets with DNS over HTTPS Traffic" :
─── supplementary_files | - Directory with supplementary files (scripts, DoH resolver list) used for dataset creation ├── chrome | - Generation scripts for Chrome browser and visited websites during generation ├── doh_resolvers | - The list of DoH resolvers used for filter creation during ISP backbone capture ├── firefox | - Generation scripts for Firefox browser and visited websites during generation └── pcap-anonymizer | - Anonymization script of real backbone captures
Collection of datasets:
DoH-Gen-F-AABBC --- https://doi.org/10.5281/zenodo.5957277
DoH-Gen-F-FGHOQS --- https://doi.org/10.5281/zenodo.5957121
DoH-Gen-F-CCDDD --- https://doi.org/10.5281/zenodo.5957420
DoH-Gen-C-AABBCC --- https://doi.org/10.5281/zenodo.5957465
DoH-Gen-C-DDD -- https://doi.org/10.5281/zenodo.5957676
DoH-Gen-C-CFGHOQS --- https://doi.org/10.5281/zenodo.5957659
DoH-Real-world --- https://doi.org/10.5281/zenodo.5956043
https://opendata.vancouver.ca/pages/licence/https://opendata.vancouver.ca/pages/licence/
This dataset contains the locations of intersections with traffic counts and links to collected data. Information on traffic counts is collected by staff at intersections and includes detailed information by lane and direction. Traffic information is also collected by automated counters at mid-block locations and focuses on direction specifically. That is found in separate dataset, Directional traffic count locations. Data currencyThis is a static dataset Data accuracyThe locations are approximate, either in the intersection of two or more streets or along a block between intersections. Websites for further informationTraffic count data
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The people from Czech are publishing a dataset for the HTTPS traffic classification.
Since the data were captured mainly in the real backbone network, they omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).
During research, they divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.
They have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. They also used several popular websites that primarily focus on the audience in Czech. The identified traffic classes and their representatives are provided below:
Live Video Stream Twitch, Czech TV, YouTube Live Video Player DailyMotion, Stream.cz, Vimeo, YouTube Music Player AppleMusic, Spotify, SoundCloud File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive Website and Other Traffic Websites from Alexa Top 1M list
This dataset is composed of the URLs of the top 1 million websites. The domains are ranked using the Alexa traffic ranking which is determined using a combination of the browsing behavior of users on the website, the number of unique visitors, and the number of pageviews. In more detail, unique visitors are the number of unique users who visit a website on a given day, and pageviews are the total number of user URL requests for the website. However, multiple requests for the same website on the same day are counted as a single pageview. The website with the highest combination of unique visitors and pageviews is ranked the highest
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.
Activities:
Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.
The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.
The amount of data is stated as follows:
Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes
The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
2018.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data is collected by the inductive loop detectors deployed on freeways in Seattle area. The freeways contain I-5, I-405, I-90, and SR-520. This data set contains spatiotemporal speed information of the freeway system. At each milepost, the speed information collected from main lane loop detectors in the same direction are averaged and integrated into 5 minutes interval speed data. The raw data is provided by Washington Start Department of Transportation (WSDOT) and processed by the STAR Lab in the University of Washington according to data quality control and data imputation procedures [1][2].
The data file is a pickle file that can be easily read using the read_pickle() function in the Pandas package. The data forms as a matrix and each cell of the matrix is speed value for the specific milepost and time period. The horizontal header of the data set denotes the milepost and the vertical header indicates the timestamps. For more information on the definition of milepost, please refer to this website.
This data set been used for traffic prediction tasks in several research studies [3][4]. For more detailed information about the data set, you can also refer to this link.
References:
[1]. Henrickson, K., Zou, Y., & Wang, Y. (2015). Flexible and robust method for missing loop detector data imputation. Transportation Research Record, 2527(1), 29-36.
[2]. Wang, Y., Zhang, W., Henrickson, K., Ke, R., & Cui, Z. (2016). Digital roadway interactive visualization and evaluation network applications to WSDOT operational data usage (No. WA-RD 854.1). Washington (State). Dept. of Transportation.
[3]. Cui, Z., Ke, R., & Wang, Y. (2018). Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. arXiv preprint arXiv:1801.02143.
[4]. Cui, Z., Henrickson, K., Ke, R., & Wang, Y. (2018). Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting. arXiv preprint arXiv:1802.07007.
This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.
The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.
This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is available on Brisbane City Council’s open data website – data.brisbane.qld.gov.au. The site provides additional features for viewing and interacting with the data and for downloading the data in various formats.
Traffic data for key Brisbane City Council managed roads. Includes monthly traffic volume, travel time and speed.
The data which was received as Incomplete and Incorrect was marked as NA in the report.
Information on Traffic Management is available on the Brisbane City Council website.
This data was previously published in a different format in the following two datasets:
The Data and resources section of this dataset contains further information for this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Traffic data for key Brisbane City Council managed roads. Includes monthly averages for traffic volume, travel time and speed. Information on Traffic Management is available on the Brisbane City …Show full descriptionTraffic data for key Brisbane City Council managed roads. Includes monthly averages for traffic volume, travel time and speed. Information on Traffic Management is available on the Brisbane City Council website. This data was previously published in a different format in the following two datasets. Traffic Management — Key Corridor — Average Peak Travel times Traffic Management — Key Corridor — Performance Report
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We present a dataset targeting a large set of popular pages (Alexa top-500), from probes from several ISPs networks, browsers software (Chrome, Firefox) and viewport combinations, for over 200,000 experiments realized in 2019.We purposely collect two distinct sets with two different tools, namely Web Page Test (WPT) and Web View (WV), varying a number of relevant parameters and conditions, for a total of 200K+ web sessions, roughly equally split among WV and WPT. Our dataset comprises variations in terms of geographical coverage, scale, diversity and representativeness (location, targets, protocol, browser, viewports, metrics).For Web Page Test, we used the online service www.webpagetest.org at different locations worldwide (Europe, Asia, USA) and private WPT instances in three locations in China (Beijing, Shanghai, Dongguan). The list of target URLs comprised the main pages and five random subpages from Alexa top-500 worldwide and China. We varied network conditions : native connections and 4G, FIOS, 3GFast, DSL, and custom shaping/loss conditions. The other elements in the configuration were fixed: Chrome browser on desktop with a fixed screen resolution, HTTP/2 protocol and IPv4.For Web View, we collected experiments from three machines located in France. We selected two versions of two browser families (Chrome 75/77, Firefox 63/68), two screen sizes (1920x1080, 1440x900), and employ different browser configurations (one half of the experiments activate the AdBlock plugin) from two different access technologies (fiber and ADSL). From a protocol standpoint, we used both IPv4 and IPv6, with HTTP/2 and QUIC, and performed repeated experiments with cached objects/DNS. Given the settings diversity, we restricted the number of websites to about 50 among the Alexa top-500 websites, to ensure statistical relevance of the collected samples for each page.The two archives IFIPNetworking2020_WebViewOrange.zip
and IFIPNetworking2020_Webpagetest.zip
correspond respectively to the Web View experiments and to the Web Page Test experiments.Each archive contains three files:- config.csv
: Description of parameters and conditions for each run,- metrics.csv
: Value of different metrics collected by the browser,- progressionCurves.csv
: Progression curves of the bytes progress as seen by the network, from 0 to 10 seconds by steps of 100 milliseconds,- listUrl
folder: Indexes the sets of urls.Regarding config.csv
, the columns are: - index: Index for this set of conditions, - location: Location of the machine, - listUrl: List of urls, located in the folder listUrl - browserUsed: Internet browser and version - terminal: Desktop or Mobile - collectionEnvironment: Identification of the collection environment - networkConditionsTrafficShaping (WPT only): Whether native condition or traffic shaping (4G, FIOS, 3GFast, DSL, or custom Emulator conditions) - networkConditionsBandwidth (WPT only): Bandwidth of the network - networkConditionsDelay (WPT only): Delay in the network - networkConditions (WV only): network conditions - ipMode (WV only): requested L3 protocol, - requestedProtocol (WV only): requested L7 protocol - adBlocker (WV only): Whether adBlocker is used or not - winSize (WV only): Window sizeRegarding metrics.csv
, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - DOM Content Loaded Event End (ms): DOM time, - First Paint (ms) (WV only): First paint time, - Load Event End (ms): Page Load Time from W3C, - RUM Speed Index (ms) (WV only): RUM Speed Index, - Speed Index (ms) (WPT only): Speed Index, - Time for Full Visual Rendering (ms) (WV only): Time for Full Visual Rendering - Visible portion (%) (WV only): Visible portion, - Time to First Byte (ms) (WPT only): Time to First Byte, - Visually Complete (ms) (WPT only): Visually Complete used to compute the Speed Index, - aatf: aatf using ATF-chrome-plugin - bi_aatf: bi_aatf using ATF-chrome-plugin - bi_plt: bi_plt using ATF-chrome-plugin - dom: dom using ATF-chrome-plugin - ii_aatf: ii_aatf using ATF-chrome-plugin - ii_plt: ii_plt using ATF-chrome-plugin - last_css: last_css using ATF-chrome-plugin - last_img: last_img using ATF-chrome-plugin - last_js: last_js using ATF-chrome-plugin - nb_ress_css: nb_ress_css using ATF-chrome-plugin - nb_ress_img: nb_ress_img using ATF-chrome-plugin - nb_ress_js: nb_ress_js using ATF-chrome-plugin - num_origins: num_origins using ATF-chrome-plugin - num_ressources: num_ressources using ATF-chrome-plugin - oi_aatf: oi_aatf using ATF-chrome-plugin - oi_plt: oi_plt using ATF-chrome-plugin - plt: plt using ATF-chrome-pluginRegarding progressionCurves.csv
, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - url: Url of the current page. SUBPAGE stands for a path. - run: Current run (linked with index of the config for WPT) - filename: Filename of the pcap - fullname: Fullname of the pcap - har_size: Size of the HAR for this experiment, - pagedata_size: Size of the page data report - pcap_size: Size of the pcap - App Byte Index (ms): Application Byte Index as computed from the har file (in the browser) - bytesIn_APP: Total bytes in as seen in the browser, - bytesIn_NET: Total bytes in as seen in the network, - X_BI_net: Network Byte Index computed from the pcap file (in the network) - X_bin_0_for_B_completion to X_bin_99_for_B_completion: X_bin_k_for_B_completion is the bytes progress reached after k*100 millisecondsIf you use these datasets in your research, you can reference to the appropriate paper:@inproceedings{qoeNetworking2020, title={Revealing QoE of Web Users from Encrypted Network Traffic}, author={Huet, Alexis and Saverimoutou, Antoine and Ben Houidi, Zied and Shi, Hao and Cai, Shengming and Xu, Jinchun and Mathieu, Bertrand and Rossi, Dario}, booktitle={2020 IFIP Networking Conference (IFIP Networking)}, year={2020}, organization={IEEE}}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The accompanying data cover all MPD stops including vehicle, pedestrian, bicycle, and harbor stops for the period from January 1, 2023 – June 30, 2024. A stop may involve a ticket (actual or warning), investigatory stop, protective pat down, search, or arrest.If the final outcome of a stop results in an actual or warning ticket, the ticket serves as the official documentation for the stop. The information provided in the ticket include the subject’s name, race, gender, reason for the stop, and duration. All stops resulting in additional law enforcement actions (e.g., pat down, search, or arrest) are documented in MPD’s Record Management System (RMS). This dataset includes records pulled from both the ticket (District of Columbia Department of Motor Vehicles [DMV]) and RMS sources. Data variables not applicable to a particular stop are indicated as “NULL.” For example, if the stop type (“stop_type” field) is a “ticket stop,” then the fields: “stop_reason_nonticket” and “stop_reason_harbor” will be “NULL.”Each row in the data represents an individual stop of a single person, and that row reveals any and all recorded outcomes of that stop (including information about any actual or warning tickets issued, searches conducted, arrests made, etc.). A single traffic stop may generate multiple tickets, including actual, warning, and/or voided tickets. Additionally, an individual who is stopped and receives a traffic ticket may also be stopped for investigatory purposes, patted down, searched, and/or arrested. If any of these situations occur, the “stop_type” field would be labeled “Ticket and Non-Ticket Stop.” If an individual is searched, MPD differentiates between person and property searches. Please note that the term property in this context refers to a person’s belongings and not a physical building. The “stop_location_block” field represents the block-level location of the stop and/or a street name. The age of the person being stopped is calculated based on the time between the person’s date of birth and the date of the stop.There are certain locations that have a high prevalence of non-ticket stops. These can be attributed to some centralized processing locations. Additionally, there is a time lag for data on some ticket stops as roughly 20 percent of tickets are handwritten. In these instances, the handwritten traffic tickets are delivered by MPD to the DMV, and then entered into data systems by DMV contractors.On August 1, 2021, MPD transitioned to a new version of its current records management system, Mark43 RMS.Beginning January 1, 2023, fields pertaining to the bureau, division, unit, and PSA (if applicable) of the officers involved in events where a stop was conducted were added to the dataset. MPD’s Records Management System (RMS) captures all members associated with the event but cannot isolate which officer (if multiple) conducted the stop itself. Assignments are captured by cross-referencing officers’ CAD ID with MPD’s Timesheet Manager Application. These fields reflect the assignment of the officer issuing the Notice of Infraction (NOIs) and/or the responding officer(s), assisting officer(s), and/or arresting officer(s) (if an investigative stop) as of the end of the two-week pay period for January 1 – June 30, 2023 and as of the date of the stop for July 1, 2023 and forward. The values are comma-separated if multiple officers were listed in the report.For Stop Type = Harbor and Stop Type = Ticket Only, the officer assignment information will be in the NOI_Officer fields. For Stop Type = Ticket and Non-Ticket the officer assignments will be in both NOI Officer (for the officer that issued the NOI) and RMS_Officer fields (for any other officer involved in the event, which may also be the officer who issued the NOI). For Stop Type = Non-Ticket, the officer assignment information will be in the RMS_Officer fields.Null values in officer assignment fields reflect either Reserve Corps members, who’s assignments are not captured in the Timesheet Manager Application, or members who separated from MPD between the time of the stop and the time of the data extraction.Finally, MPD is conducting on-going data audits on all data for thorough and complete information. Figures are subject to change due to delayed reporting, on-going data quality audits, and data improvement processes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of multiple different data sources:
We captured the traffic from the DoH enabled web-browsers using tcpdump. To automate the process of traffic generation, we installed Google Chrome and Mozilla Firefox into separate virtual machines and controlled them with the Selenium framework shows detailed information about used browsers and environments). Selenium simulates a user's browsing according to the predefined script and a list of domain names (i.e., URLs from Alexa's top websites list in our case). The selenium was configured to visit pages in random order multiple times. For capturing the traffic, we used the default settings of each browser. We did not disable the DNS cache of the browser, and the random order of visiting webpages secures that the dataset contains traces influenced by DNS caching mechanisms. Each virtual machine was configured to export TLS cryptographic keys, that was used for decrypting the traffic using WireShark application.
The WireShark text output of the decrypted traffic is provided in the dataset files. The detailed information about each file is provided in dataset README.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a link to the NSW Toll Road Data website. The datasets on this website contain traffic data for the following toll roads in Sydney, New South Wales, Australia that are wholly or partly owned by Transurban:
Data available is grouped by quarter for each year starting 2009.
Data that that populates the Vision Zero View map, which can be found at www.nycvzv.info Vision Zero is the City's goal for ending traffic deaths and injuries. The Vision Zero action plan can be found at http://www.nyc.gov/html/visionzero/pdf/nyc-vision-zero-action-plan.pdf Crash data is obtained from the Traffic Accident Management System (TAMS), which is maintained by the New York City Police Department (NYPD). Only crashes with valid geographic information are mapped. All midblock crashes are mapped to the nearest intersection. Injuries and fatalities are grouped by intersection and summarized by month and year. This data is queried and aggregated on a monthly basis and is current as of the query date. Current year data is January to the end of the latest full month. All mappable crash data is represented on the simplified NYC street model. Crashes occurring at complex intersections with multiple roadways are mapped onto a single point. Injury and fatality crashes occurring on highways are excluded from this data. Please note that this data is preliminary and may contain errors, accordingly, the data on this site is for informational purposes only. Although all attempts to provide the most accurate information are made, errors may be present and any person who relies upon this data does so at their own risk.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Web Bench: A real-world benchmark for Browser Agents
WebBench is an open, task-oriented benchmark that measures how well browser agents handle realistic web workflows. It contains 2 ,454 tasks spread across 452 live websites selected from the global top-1000 by traffic. Last updated: May 28, 2025
Dataset Composition
Category Description Example Count (% of dataset)
READ Tasks that require searching and extracting information “Navigate to the news section and… See the full description on the dataset page: https://huggingface.co/datasets/Halluminate/WebBench.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Background
Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.
Methodology
The data collected originates from SimilarWeb.com.
Source
For the analysis and study, go to The Concept Center
This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.
- Analyze 11/1/2016 in relation to 2/1/2017
- Study the influence of 4/1/2017 on 1/1/2017
- More datasets
If you use this dataset in your research, please credit Chase Willden
--- Original source retains full ownership of the source dataset ---