80 datasets found

d
Web Traffic Data | Cookieless First Party Opt-In Platform | Capture/Resolve...
datarade.ai
.csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VisitIQ™, Web Traffic Data | Cookieless First Party Opt-In Platform | Capture/Resolve Website Visitors | Pixel | B2B2C 300 Million records | US [Dataset]. https://datarade.ai/data-products/visitiq-web-traffic-data-cookieless-first-party-opt-in-p-visitiq
Explore at:
.csvAvailable download formats
Dataset authored and provided by
VisitIQ™
Area covered
United States of America
Description
Be ready for a cookieless internet while capturing anonymous website traffic data!

By installing the resolve pixel onto your website, business owners can start to put a name to the activity seen in analytics sources (i.e. GA4). With capture/resolve, you can identify up to 40% or more of your website traffic. Reach customers BEFORE they are ready to reveal themselves to you and customize messaging toward the right product or service.

This product will include Anonymous IP Data and Web Traffic Data for B2B2C.

Get a 360 view of the web traffic consumer with their business data such as business email, title, company, revenue, and location.

Super easy to implement and extraordinarily fast at processing, business owners are thrilled with the enhanced identity resolution capabilities powered by VisitIQ's First Party Opt-In Identity Platform. Capture/resolve and identify your Ideal Customer Profiles to customize marketing. Identify WHO is looking, WHAT they are looking at, WHERE they are located and HOW the web traffic came to your site.

Create segments based on specific demographic or behavioral attributes and export the data as a .csv or through S3 integration.

Check our product that has the most accurate Web Traffic Data for the B2B2C market.
Share of global mobile website traffic 2015-2024
statista.com
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of global mobile website traffic 2015-2024 [Dataset]. https://www.statista.com/statistics/277125/share-of-website-traffic-coming-from-mobile-devices/
Explore at:
Dataset updated
Jan 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Mobile accounts for approximately half of web traffic worldwide. In the last quarter of 2024, mobile devices (excluding tablets) generated 62.54 percent of global website traffic. Mobiles and smartphones consistently hoovered around the 50 percent mark since the beginning of 2017, before surpassing it in 2020. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.
reddit.com Website Traffic, Ranking, Analytics [July 2025]
semrush.com
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). reddit.com Website Traffic, Ranking, Analytics [July 2025] [Dataset]. https://www.semrush.com/website/reddit.com/overview/
Explore at:
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
Time period covered
Aug 12, 2025
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
reddit.com is ranked #5 in US with 4.66B Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!
Website Statistics
data.wu.ac.at
data.europa.eu
csv, pdf
Updated Jun 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
Explore at:
csv, pdfAvailable download formats
Dataset updated
Jun 11, 2018
Dataset provided by
Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
w
Cleaned Newly Registered Domain names Count
whoisfreaks.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WhoisFreaks, Cleaned Newly Registered Domain names Count [Dataset]. https://whoisfreaks.com/products/newly-registered-domains
Explore at:
Dataset authored and provided by
WhoisFreaks
License
https://whoisfreaks.com/termshttps://whoisfreaks.com/terms
Time period covered
Aug 25, 2025 - Sep 1, 2025
Area covered
Lahore, Pakistan
Description
Cleaned Newly Registered Domain names Count offers an up-to-date overview of newly registered domains, including both generic top-level domains (gTLDs) and country-code top-level domains (ccTLDs). Stay up-to-dated and make data-driven decisions in the domain industry with our regularly updated dataset.
Leading websites worldwide 2024, by monthly visits
statista.com
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
Explore at:
Dataset updated
Mar 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2024
Area covered
Worldwide
Description
In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.
f
WP-Script | Web Hosting & Domain Names | Technology Data
datastore.forage.ai
Updated Nov 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). WP-Script | Web Hosting & Domain Names | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=web
Explore at:
Dataset updated
Nov 20, 2024
Description
WP-Script is a company that provides WordPress themes and plugins for creating adult sites. They offer a range of products, including seven customizable adult WordPress themes and thirteen powerful adult WordPress plugins. Their products are designed to be easy to use and can help entrepreneurs create professional-looking adult sites with minimal technical expertise.

With WP-Script, you can start your adult site in six easy steps. They also offer a 14-day money-back guarantee, giving you the opportunity to test their products risk-free. Additionally, they provide premium support to help you resolve any issues you may encounter. Their customers love their products, citing excellent themes, easy installation, and good customer support.
w
General Newly Registered Domains Count
whoisfreaks.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WhoisFreaks, General Newly Registered Domains Count [Dataset]. https://whoisfreaks.com/products/newly-registered-domains
Explore at:
Dataset authored and provided by
WhoisFreaks
License
https://whoisfreaks.com/termshttps://whoisfreaks.com/terms
Time period covered
Aug 25, 2025 - Sep 1, 2025
Area covered
Pakistan, Lahore
Description
General Newly Registered Domains Count provides a comprehensive record of newly registered domains across different zones, encompassing both generic top-level domains (gTLDs) and country-code top-level domains (ccTLDs). Updated daily to include today's count, this dataset offers valuable insights into the evolving domain registration landscape. Whether you're tracking market trends or conducting research, our dataset equips you with the latest information to make informed decisions in the domain industry.

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large...

zenodo.org

json

Updated Dec 11, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Radek Hranický; Radek Hranický; Jan Polišenský; Jan Polišenský; Adam Horák; Petr Pouč; Petr Pouč; Kamil Jeřábek; Kamil Jeřábek; Tomáš Ebert; Adam Horák; Tomáš Ebert (2024). A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large Corpus of Benign, Phishing, and Malware Domain Names 2024 [Dataset]. http://doi.org/10.5281/zenodo.14332167

Explore at:

jsonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14332167

Dataset updated

Dec 11, 2024

Dataset provided by

Zenodo

Authors

Radek Hranický; Radek Hranický; Jan Polišenský; Jan Polišenský; Adam Horák; Petr Pouč; Petr Pouč; Kamil Jeřábek; Kamil Jeřábek; Tomáš Ebert; Adam Horák; Tomáš Ebert

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Aug 16, 2024

Description

The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS handshakes and certificates, and GeoIP information for 368,956 benign domains from Cisco Umbrella, 461,338 benign domains from the actual CESNET network traffic, 164,425 phishing domains from PhishTank and OpenPhish services, and 100,809 malware domains from various sources like ThreatFox, The Firebog, MISP threat intelligence platform, and other sources. The ground truth for the phishing dataset was double-check with the VirusTotal (VT) service. Domain names not considered malicious by VT have been removed from phishing and malware datasets. Similarly, benign domain names that were considered risky by VT have been removed from the benign datasets. The data was collected between March 2023 and July 2024. The final assessment of the data was conducted in August 2024.

The dataset is useful for cybersecurity research, e.g. statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing and malware website detection.

The dataset was created using software available in the associated GitHub repository nesfit/domainradar-dib.

Data Files

The data is located in the following individual files:
- benign_umbrella.json - data for 368,956 benign domains from Cisco Umbrella,
- benign_cesnet.json - data for 461,338 benign domains from the CESNET network,
- phishing.json - data for 164,425 phishing domains, and
- malware.json - data for 100,809 malware domains.
The schema.json file contains a JSON Schema with detailed description of the data entries.

Data Structure

Both files contain a JSON array of records generated using mongoexport (in the MongoDB Extended JSON (v2) format in Relaxed Mode). The following table documents the structure of a record. Please note that:

some fields may be missing (they should be interpreted as nulls),
extra fields may be present (they should be ignored).

Field name	Field type	Nullable	Description
domain_name	String	No	The evaluated domain name
url	String	No	The source URL for the domain name
evaluated_on	Date	No	Date of last collection attempt
source	String	No	An identifier of the source
sourced_on	Date	No	Date of ingestion of the domain name
dns	Object	Yes	Data from DNS scan
rdap	Object	Yes	Data from RDAP or WHOIS
tls	Object	Yes	Data from TLS handshake
ip_data	Array of Objects	Yes	Array of data objects capturing the IP addresses related to the domain name
malware_type	String	No	The malware type/family or “unknown” (only present in malware.json)
DNS data (dns field)
A	Array of Strings	No	Array of IPv4 addresses
AAAA	Array of Strings	No	Array of IPv6 addresses
TXT	Array of Strings	No	Array of raw TXT values
CNAME	Object	No	The CNAME target and related IPs
MX	Array of Objects	No	Array of objects with the MX target hostname, priority and related IPs
NS	Array of Objects	No	Array of objects with the NS target hostname and related IPs
SOA	Object	No	All the SOA fields, present if found at the target domain name
zone_SOA	Object	No	The SOA fields of the target’s zone (closest point of delegation), present if found and not a record in the target domain directly
dnssec	Object	No	Flags describing the DNSSEC validation result for each record type
ttls	Object	No	The TTL values for each record type
remarks	Object	No	The zone domain name and DNSSEC flags
RDAP data (rdap field)
copyright_notice	String	No	RDAP/WHOIS data usage copyright notice
dnssec	Bool	No	DNSSEC presence flag
entitites	Object	No	An object with various arrays representing the found related entity types (e.g. abuse, admin, registrant). The arrays contain objects describing the individual entities.
expiration_date	Date	Yes	The current date of expiration
handle	String	No	RDAP handle
last_changed_date	Date	Yes	The date when the domain was last changed
name	String	No

youtube.com Website Traffic, Ranking, Analytics [July 2025]
semrush.com
stb2.digiseotools.com
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). youtube.com Website Traffic, Ranking, Analytics [July 2025] [Dataset]. https://www.semrush.com/website/youtube.com/overview/
Explore at:
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
Time period covered
Aug 12, 2025
Area covered
YouTube, Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
youtube.com is ranked #1 in KR with 47.12B Traffic. Categories: Newspapers, Online Services. Learn more about website traffic, market share, and more!
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
Use of paid content on the internet website "test.de" in Germany 2023
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Use of paid content on the internet website "test.de" in Germany 2023 [Dataset]. https://www.statista.com/statistics/1368949/paid-content-use-test-de-germany/
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Germany
Description
In 2023, the German website test.de had recorded ******* paid views. This was the highest figure since 2009. Test.de is the website of the consumer organization Stiftung Warentest, which tests and compares goods and services.
w
Specific Newly Registered Domain names Count
whoisfreaks.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WhoisFreaks, Specific Newly Registered Domain names Count [Dataset]. https://whoisfreaks.com/products/newly-registered-domains
Explore at:
Dataset authored and provided by
WhoisFreaks
License
https://whoisfreaks.com/termshttps://whoisfreaks.com/terms
Time period covered
Aug 25, 2025 - Sep 1, 2025
Area covered
Lahore, Pakistan
Description
Specific Newly Registered Domains Count provides an up-to-date overview of newly registered domains, including both generic top-level domains (gTLDs) and country-code top-level domains (ccTLDs). Stay informed and make data-driven decisions in the domain industry with our regularly updated dataset, focusing on TLDs and ccTLDs.
E
Data from: WMT17 Quality Estimation Shared Test Data
live.european-language-grid.eu
binary format
Updated Apr 12, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). WMT17 Quality Estimation Shared Test Data [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1176
Explore at:
binary formatAvailable download formats
Dataset updated
Apr 12, 2017
License
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
Description
Test data for the WMT17 QE task. Train data can be downloaded from http://hdl.handle.net/11372/LRT-1974

This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks will make use of a large dataset produced from post-editions by professional translators. The data will be domain-specific (IT and Pharmaceutical domains) and substantially larger than in previous years. In addition to advancing the state of the art at all prediction levels, our goals include:

- To test the effectiveness of larger (domain-specific and professionally annotated) datasets. We will do so by increasing the size of one of last year's training sets.
- To study the effect of language direction and domain. We will do so by providing two datasets created in similar ways, but for different domains and language directions.
- To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits.

This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes.
etsy.com Website Traffic, Ranking, Analytics [July 2025]
semrush.com
semrush.ebundletools.com
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). etsy.com Website Traffic, Ranking, Analytics [July 2025] [Dataset]. https://www.semrush.com/website/etsy.com/overview/
Explore at:
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
Time period covered
Aug 12, 2025
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
etsy.com is ranked #55 in US with 326.39M Traffic. Categories: Retail, Online Services. Learn more about website traffic, market share, and more!
S
Global Website Down Checker Market Investment Landscape 2025-2032
statsndata.org
excel, pdf
Updated Jul 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global Website Down Checker Market Investment Landscape 2025-2032 [Dataset]. https://www.statsndata.org/report/website-down-checker-market-328712
Explore at:
excel, pdfAvailable download formats
Dataset updated
Jul 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
The Website Down Checker market has become an increasingly vital sector within the digital landscape, as businesses of all sizes seek to ensure their online presence remains accessible and reliable. These tools help quickly identify when a website is not functioning, providing immediate alerts that can save companie
Network Traffic Dataset
kaggle.com
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravikumar Gattu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

Content :

This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

Dataset Columns:

No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

Acknowledgements :

I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

Ravikumar Gattu , Susmitha Choppadandi

Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

**Dataset License: ** CC0: Public Domain

Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

ML techniques benefits from this Dataset :

This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
Z
CARLA Simulation Datasets for Training, Validation, and Test Data of the...
data.niaid.nih.gov
zenodo.org
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaikh, Hamdaan Asif (2024). CARLA Simulation Datasets for Training, Validation, and Test Data of the project "Out-Of-Domain Data Detection using Uncertainty Quantification in End-to-End Driving Algorithms" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10511420
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Shaikh, Hamdaan Asif
Description
These are CARLA Simulation Datasets of the project "Out-Of-Domain Data Detection using Uncertainty Quantification in End-to-End Driving Algorithms". The simulations are generated in CARLA Town 02 for different sun angles (in degrees). You will find image frames, command labels, and steering control values in the respective 'xxxx_files_data' folder. You will find videos of each simulation run in the 'xxxx_files_visualizations' folder.

The 8 simulation runs for Training Data, are with the Sun Angles : 90, 80, 70, 60, 50, 40, 30, 20

The 8 simulation runs for Training Data were seeded at 0000, 1000, 2000, 3000, 4000, 5000, 6000, 7000 respectively

The 4 simulation runs for Validation Data, are with the Sun Angles : 87, 67, 47, 23

The 4 simulation runs for Validation Data were seeded at 0000, 2000, 4000, 7000 respectively

The 29 simulation runs for Testing Data, are with the Sun Angles : 85, 75, 65, 55, 45, 35, 25, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 09, 08, 07, 06, 05, 04, 03, 02, 01, 00, -1, -10

The 29 simulation runs for Testing Data were all seeded at 5000 respectively
e
Full-population web crawl of .gov.uk web domain, 2014 - Dataset - B2FIND
b2find.eudat.eu
Updated Aug 10, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Full-population web crawl of .gov.uk web domain, 2014 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/2811fbd9-62e3-5722-a5c6-27f17928f3de
Explore at:
Dataset updated
Aug 10, 2019
Description
This dataset is the result of a full-population crawl of the .gov.uk web domain, aiming to capture a full picture of the scope of public-facing government activity online and the links between different government bodies. Local governments have been developing online services, aiming to better serve the public and reduce administrative costs. However, the impact of this work, and the links between governments’ online and offline activities, remain uncertain. The overall research question for this research examines whether local e-government has met these expectations, of Digital Era Governance and of its practitioners. Aim was to directly analyse the structure and content of government online. It shows that recent digital-centric public administration theories, typified by the Digital Era Governance quasi-paradigm, are not empirically supported by the UK local government experience. The data consist of a file of individual Uniform Resource Locators (URLs) fetched during the crawl, and a further file containing pairs of URLs reflecting the Hypertext Markup Language (HTML) links between them. In addition, a GraphML format file is presented for a version of the data reduced to third-level-domains, with accompanying attribute data for the publishing government organisations and calculated webometric statistics based on the third-level-domain link network.This project engages with the Digital Era Governance (DEG) work of Dunleavy et. al. and draws upon new empirical methods to explore local government and its use of Internet-related technology. It challenges the existing literature, arguing that e-government benefits have been oversold, particularly for transactional services; it updates DEG with insights from local government. The distinctive methodological approach is to use full-population datasets and large-scale web data to provide an empirical foundation for theoretical development, and to test existing theorists’ claims. A new full-population web crawl of .gov.uk is used to analyse the shape and structure of online government using webometrics. Tools from computer science, such as automated classification, are used to enrich our understanding of the dataset. A new full-population panel dataset is constructed covering council performance, cost, web quality, and satisfaction. The local government web shows a wide scope of provision but only limited evidence in support of the existing rhetorics of Internet-enabled service delivery. In addition, no evidence is found of a link between web development and performance, cost, or satisfaction. DEG is challenged and developed in light of these findings. The project adds value by developing new methods for the use of big data in public administration, by empirically challenging long-held assumptions on the value of the web for government, and by building a foundation of knowledge about local government online to be built on by further research. This is an ESRC-funded DPhil research project. A web crawl was carried out with Heritrix, the Internet Archive's web crawler. A list of all registered domains in .gov.uk (and their www.x.gov.uk equivalents) was used as a set of start seeds. Sites outside .gov.uk were excluded; robots.txt files were respected, with the consequence that some .gov.uk sites (and some parts of other .gov.uk sites) were not fetched. Certain other areas were manually excluded, particularly crawling traps (e.g. calendars which will serve infinite numbers of pages in the past and future and those websites returning different URLs for each browser session) and the contents of certain large peripheral databases such as online local authority library catalogues. A full set of regular expressions used to filter the URLs fetched are included in the archive. On completion of the crawl, the page URLs and link data were extracted from the output WARC files. The page URLs were manually examined and re-filtered to handle various broken web servers and to reduce duplication of content where multiple views were presented onto the same content (for example, where a site was presented at both http://organisation.gov.uk/ and http://www.organisation.gov.uk/ without HTTP redirection between the two). Finally, The link list was filtered against the URL list to remove bogus links and both lists were map/reduced to a single set of files. Also included in this data release is a derived dataset more useful for high-level work. This is a GraphML file containing all the link and page information reduced to third-level domain level (so darlington.gov.uk is considered as a single node, not a large set of pages) and with the links binarised to present/not present between each node. Each graph node also has various attributes, including the name of the registering organisation and various webometric measures including PageRank, indegree and betweenness centrality.

Multi-Domain Dataset for Robots (MDDRobots) - Multi-Domain Indoor Dataset...

zenodo.org

bin, zip

Updated May 20, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Piotr Wozniak; Piotr Wozniak; Tomasz Krzeszowski; Tomasz Krzeszowski; Bogdan Kwolek; Bogdan Kwolek (2025). Multi-Domain Dataset for Robots (MDDRobots) - Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots [Dataset]. http://doi.org/10.5281/zenodo.15340287

Explore at:

bin, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15340287

Dataset updated

May 20, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Piotr Wozniak; Piotr Wozniak; Tomasz Krzeszowski; Tomasz Krzeszowski; Bogdan Kwolek; Bogdan Kwolek

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

License

The MDDRobots dataset is made available under the CC BY 4.0 license https://creativecommons.org/licenses/by/4.0/.

Summary

The Multi-Domain Dataset for Robots (MDDRobots) contains data for computer vision problems, indoor visual place recognition, and anomaly detection. The recorded images are from different cameras and indoor environmental conditions.

It is obligatory to cite the following paper in every work that uses the dataset:
Wozniak, P., Krzeszowski, T. & Kwolek, B. Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots. Sci Data 12, 817 (2025). https://doi.org/10.1038/s41597-025-05124-3

Data description

The data are divided into five sets (containing data for different cameras), which have further subsets. Each of the subsets: Training, Test 1, Test 2, and Test 3 consists of nine image sequences. A total of 89,550 three-channel RGB color images in PNG format are organized into 20 zip folders with a whole size of 34.3 GB. Each image in the sequence has a label that represents a room. The number of images for each subset differs due to the split into training and testing data. The difference also results from different methods of recording the image sequences. In order to have balanced data in the subsets, each room in the sequence has the same number of images. Different environmental changes were introduced in each subset. The data from Test 1 are closest to those from the training set. The differences between the sequences are mainly due to changes in the route, robot, and recording equipment. The rooms are well lighted, but not overexposed. The sequences from Test 3 present changed conditions, such as a different time of day, a changed lighting system, and intensive layout changes. The key change is the different paths of the human and the robot. This means a different perspective from previously recorded scenes. The Test 2 sequences pose the most difficult challenge because they contain various recorded activities performed by people moving around rooms. People can occlude important parts of the scene and pass in front of the camera. The images were anonymized by manually blurring the faces of observed people.

Dataset structure

RobotPiCamera_DataSet
- DataSet_RobotPiCamera_RGB_train
- DataSet_RobotPiCamera_RGB_test1
- DataSet_RobotPiCamera_RGB_test2
- DataSet_RobotPiCamera_RGB_test3
Xtion_DataSet
- DataSet_XTION_RGB_train
- DataSet_XTION_RGB_test1
- DataSet_XTION_RGB_test2
- DataSet_XTION_RGB_test3
GOPRO_DataSet
- DataSet_GOPRO_RGB_train
- DataSet_GOPRO_RGB_test1
- DataSet_GOPRO_RGB_test2
- DataSet_GOPRO_RGB_test3
iPhone_DataSet
- DataSet_IPHONE_RGB_train
- DataSet_IPHONE_RGB_test1
- DataSet_IPHONE_RGB_test2
- DataSet_IPHONE_RGB_test3
P40PRO_DataSet
- DataSet_P40PRO_RGB_train
- DataSet_P40PRO_RGB_test1
- DataSet_P40PRO_RGB_test2
- DataSet_P40PRO_RGB_test3

Example folder content: DataSet_P40PRO_RGB_train\Corridor1_RGB - 00000000.png, 00000001.png, 00000002.png, 00000003.png, ... 00000599.png.

Total Images (Images per Place)

Subset	Mounted	Training	Test 1	Test 2	Test 3
Pi Camera	Robot	7200 (800)	5400 (600)	5400 (600)	5400 (600)
Xtion	Robot	7200 (800)	1800 (200)	1800 (200)	1800 (200)
GoPro	Hand	5400 (600)	4500 (500)	4500 (500)	4500 (500)
iPhone	Hand	5400 (600)	4500 (500)	4500 (500)	4500 (500)
P40Pro	Hand	5400 (600)	4050 (450)	3150 (350)	3150 (350)

Further information

For any questions, comments or other issues please contact Piotr Woźniak

Facebook

Twitter

Click to copy link

Link copied

Cite

Web Traffic Data | Cookieless First Party Opt-In Platform | Capture/Resolve Website Visitors | Pixel | B2B2C 300 Million records | US

Explore at:

.csvAvailable download formats

Dataset authored and provided by

VisitIQ™

Area covered

United States of America

Description

Be ready for a cookieless internet while capturing anonymous website traffic data!

By installing the resolve pixel onto your website, business owners can start to put a name to the activity seen in analytics sources (i.e. GA4). With capture/resolve, you can identify up to 40% or more of your website traffic. Reach customers BEFORE they are ready to reveal themselves to you and customize messaging toward the right product or service.

This product will include Anonymous IP Data and Web Traffic Data for B2B2C.

Get a 360 view of the web traffic consumer with their business data such as business email, title, company, revenue, and location.

Super easy to implement and extraordinarily fast at processing, business owners are thrilled with the enhanced identity resolution capabilities powered by VisitIQ's First Party Opt-In Identity Platform. Capture/resolve and identify your Ideal Customer Profiles to customize marketing. Identify WHO is looking, WHAT they are looking at, WHERE they are located and HOW the web traffic came to your site.

Create segments based on specific demographic or behavioral attributes and export the data as a .csv or through S3 integration.

Check our product that has the most accurate Web Traffic Data for the B2B2C market.

Clear search

Close search

Google apps

Main menu

Web Traffic Data | Cookieless First Party Opt-In Platform | Capture/Resolve...

Share of global mobile website traffic 2015-2024

reddit.com Website Traffic, Ranking, Analytics [July 2025]

Website Statistics

Cleaned Newly Registered Domain names Count

Leading websites worldwide 2024, by monthly visits

WP-Script | Web Hosting & Domain Names | Technology Data

General Newly Registered Domains Count

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large...

Data Files

Data Structure

youtube.com Website Traffic, Ranking, Analytics [July 2025]

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Use of paid content on the internet website "test.de" in Germany 2023

Specific Newly Registered Domain names Count

Data from: WMT17 Quality Estimation Shared Test Data

etsy.com Website Traffic, Ranking, Analytics [July 2025]

Global Website Down Checker Market Investment Landscape 2025-2032

Network Traffic Dataset

CARLA Simulation Datasets for Training, Validation, and Test Data of the...

Full-population web crawl of .gov.uk web domain, 2014 - Dataset - B2FIND

Multi-Domain Dataset for Robots (MDDRobots) - Multi-Domain Indoor Dataset...

License

Summary

Data description

Dataset structure

Further information

Web Traffic Data | Cookieless First Party Opt-In Platform | Capture/Resolve Website Visitors | Pixel | B2B2C 300 Million records | US