13 datasets found

Z
Network Traffic Analysis: Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Homan, Sophia (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
Explore at:
Dataset updated
Jun 12, 2024
Dataset provided by
Ferrell, Nathan
Homan, Sophia
Moran, Madeline
Soni, Shreena
Chan-Tin, Eric
Honig, Joshua
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
C
Competitor Analysis Evaluation Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Competitor Analysis Evaluation Report [Dataset]. https://www.datainsightsmarket.com/reports/competitor-analysis-evaluation-1987684
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Jun 2, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The competitive landscape of the website analytics market, encompassing players like Google, BuiltWith, SEMrush, and others, is dynamic and characterized by significant growth. The market's size in 2025 is estimated at $15 billion, reflecting a Compound Annual Growth Rate (CAGR) of 15% from 2019. This robust growth is driven by increasing reliance on data-driven decision-making across businesses, expanding digital marketing strategies, and the rise of e-commerce. Key trends include the integration of AI and machine learning for more sophisticated analysis, the increasing demand for real-time data, and a growing focus on personalized user experiences. While the market faces constraints such as data privacy concerns and the complexity of integrating diverse data sources, the overall outlook remains highly positive. The market is segmented by solution type (website analytics, social media analytics, app analytics), deployment mode (cloud, on-premise), and enterprise size (small, medium, large). Companies are focusing on developing advanced analytical capabilities, strengthening partnerships, and expanding their global reach to maintain their competitive edge. The competitive analysis reveals a clear dominance by established players such as Google Analytics, leveraging its massive user base and comprehensive feature set. However, specialized tools like SEMrush and Ahrefs cater to niche needs like SEO analysis and backlink profiling. Smaller players often differentiate themselves through specialized features, superior customer support, or cost-effectiveness, carving out space within the market. Future market share will largely depend on the ability of companies to innovate, adapt to changing privacy regulations, and successfully integrate cutting-edge technologies like AI and machine learning into their offerings. The competition is expected to intensify further with the emergence of new players and the constant evolution of analytical techniques. Strategic mergers and acquisitions are also likely to reshape the market structure in the coming years.
Data from: Annual Average Daily Traffic
gisdata-caltrans.opendata.arcgis.com
data.ca.gov
+3more
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California_Department_of_Transportation (2024). Annual Average Daily Traffic [Dataset]. https://gisdata-caltrans.opendata.arcgis.com/datasets/d8833219913c44358f2a9a71bda57f76
Explore at:
Dataset updated
Sep 30, 2024
Dataset provided by
California Department of Transportationhttp://dot.ca.gov/
Authors
California_Department_of_Transportation
Area covered

Description
Annual average daily traffic is the total volume for the year divided by 365 days. The traffic count year is from October 1st through September 30th. Very few locations in California are actually counted continuously. Traffic Counting is generally performed by electronic counting instruments moved from location throughout the State in a program of continuous traffic count sampling. The resulting counts are adjusted to an estimate of annual average daily traffic by compensating for seasonal influence, weekly variation and other variables which may be present. Annual ADT is necessary for presenting a statewide picture of traffic flow, evaluating traffic trends, computing accident rates. planning and designing highways and other purposes.Traffic Census Program Page
C
Competitor Analysis Evaluation Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Competitor Analysis Evaluation Report [Dataset]. https://www.archivemarketresearch.com/reports/competitor-analysis-evaluation-59567
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Mar 16, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global website analytics market, encompassing solutions for large enterprises and SMEs, is poised for significant growth. While the provided data lacks specific market size and CAGR figures, a reasonable estimation based on industry trends suggests a 2025 market size of approximately $15 billion, experiencing a compound annual growth rate (CAGR) of 12% from 2025 to 2033. This robust growth is fueled by several key drivers: the increasing reliance on data-driven decision-making across businesses, the escalating need for enhanced website performance optimization, and the growing adoption of sophisticated analytics tools offering deeper insights into user behavior and conversion rates. Market segmentation reveals strong demand across diverse analytics types, including product, traffic, and sales analytics. The competitive landscape is intensely dynamic, with established players like Google, SEMrush, and SimilarWeb vying for market share alongside emerging innovative companies like Owletter and TrendSource. These companies are constantly innovating to provide more comprehensive and user-friendly analytics platforms, leading to increased competition. This competitive pressure fosters innovation, but also necessitates strategic differentiation, focusing on specific niche markets or offering unique features to attract and retain customers. The market’s geographic distribution shows significant traction in North America and Europe, but emerging markets in Asia Pacific are also exhibiting substantial growth potential, driven by increasing internet penetration and digital transformation initiatives. While data security concerns and the complexity of implementing analytics tools present some restraints, the overall market outlook remains highly positive, promising considerable opportunities for market participants in the coming years.
Most common content marketing effectiveness evaluation methods in Russia...
statista.com
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most common content marketing effectiveness evaluation methods in Russia 2022 [Dataset]. https://www.statista.com/statistics/1310835/most-common-content-marketing-evaluation-methods-russia/
Explore at:
Dataset updated
Jul 18, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2022 - Mar 2022
Area covered
Russia
Description
In 2022, about ***** out of ten representatives of business companies and advertising agencies in Russia measured content marketing effectiveness by analyzing digital metrics. Furthermore, ** percent of the survey participants tracked website traffic and identified the source of the content. Nearly ** percent did not evaluate content marketing efficiency.
c
Anonymized Two-Way Traffic Packet Header Traces 100G (5 sec) sampler
catalog.caida.org
Updated Jan 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CAIDA (2025). Anonymized Two-Way Traffic Packet Header Traces 100G (5 sec) sampler [Dataset]. https://catalog.caida.org/dataset/passive_100g_sampler
Explore at:
Dataset updated
Jan 14, 2025
Dataset authored and provided by
CAIDA
License
https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/
Time period covered
Nov 2024
Description
This dataset contains anonymized layer 1-4 packet headers of two-way passive traces captured on a 100 GB link between Los Angeles and San Jose. These data are useful for research on the characteristics of Internet traffic, including application breakdown, security events, geographic and topological distribution, flow volume and duration.

Passive 100G sampler is offered to researchers at commercial organizations when they request Anonymized Internet Traces. These data are part of the 2024 Anonymized Traces 100G dataset. The files consist of 5 second snapshots of a bidirectional capture taken in November 2024.
d
Data from: Improving the efficacy of web-based educational outreach in...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Aug 19, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory R. Goldsmith; Andrew D. Fulton; Colin D. Witherill; Javier F. Espeleta (2015). Improving the efficacy of web-based educational outreach in ecology [Dataset]. http://doi.org/10.5061/dryad.94nk8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.94nk8
Dataset updated
Aug 19, 2015
Dataset provided by
Dryad
Authors
Gregory R. Goldsmith; Andrew D. Fulton; Colin D. Witherill; Javier F. Espeleta
Time period covered
Aug 19, 2014
Description
Scientists are increasingly engaging the web to provide formal and informal science education opportunities. Despite the prolific growth of web-based resources, systematic evaluation and assessment of their efficacy remains limited. We used clickstream analytics, a widely available method for tracking website visitors and their behavior, to evaluate >60,000 visits over three years to an educational website focused on ecology. Visits originating from search engine queries were a small proportion of the traffic, suggesting the need to actively promote websites to drive visitation. However, the number of visits referred to the website per social media post varied depending on the social media platform and the quality of those visits (e.g., time on site and number of pages viewed) was significantly lower than visits originating from other referring websites. In particular, visitors referred to the website through targeted promotion (e.g., inclusion in a website listing classroom teaching...
Online ad effectiveness evaluation indicators in Russia 2021, by type
statista.com
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Online ad effectiveness evaluation indicators in Russia 2021, by type [Dataset]. https://www.statista.com/statistics/1058771/online-ad-effectiveness-evaluation-by-businesses-russia/
Explore at:
Dataset updated
Jul 8, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2021
Area covered
Russia
Description
The most common indicator to consider while assessing the effectiveness of online brand advertisements in Russia in 2021 was the website traffic, as per ** percent of surveyed company representatives. Furthermore, approximately ********** measured it by the frequency of search queries related to their brands. About ** percent of participants stated their enterprises evaluated the success of performance ads by checking the number of clicks on the website.
c
Truck Average Daily Traffic
gis.data.ca.gov
data.ca.gov
+2more
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California_Department_of_Transportation (2024). Truck Average Daily Traffic [Dataset]. https://gis.data.ca.gov/maps/c079bdd6a2c54aec84b6b2f7d6570f6d_0
Explore at:
Dataset updated
Sep 30, 2024
Dataset authored and provided by
California_Department_of_Transportation
Area covered

Description
Annual average daily traffic is the total volume for the year divided by 365 days. The truck count year is from October 1st through September 30th. Very few locations in California are actually counted continuously. Truck Counting is generally performed by electronic counting instruments moved from location throughout the State in a program of continuous traffic count sampling. The resulting counts are adjusted to an estimate of annual average daily traffic by compensating for seasonal influence, weekly variation and other variables which may be present. Annual ADT is necessary for presenting a statewide picture of traffic flow, evaluating traffic trends, computing accident rates. planning and designing highways and other purposes.
GTT23: A 2023 Dataset of Genuine Tor Traces
zenodo.org
data.niaid.nih.gov
Updated Apr 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rob Jansen; Rob Jansen; Ryan Wails; Ryan Wails; Aaron Johnson; Aaron Johnson (2024). GTT23: A 2023 Dataset of Genuine Tor Traces [Dataset]. http://doi.org/10.5281/zenodo.10620520
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10620520
Dataset updated
Apr 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rob Jansen; Rob Jansen; Ryan Wails; Ryan Wails; Aaron Johnson; Aaron Johnson
Time period covered
2023
Description
The GTT23 dataset contains network metadata of encrypted traffic measured from exit relays in the Tor network over a 13-week measurement period in 2023. The metadata is suitable for analyzing and evaluating website fingerprinting attacks and defenses.

Our dataset measurement process was designed to prioritize safety and privacy and was developed through consultation with the Tor Research Safety Board (TRSB, submission #37). Our TRSB interaction resulted in a “No Objections” score.

The measurement process, additional safety and ethical considerations, and a statistical analysis of the dataset will be presented in further detail in a forthcoming publication.
g
The Groves - Low Traffic Neighbourhood Trial | gimi9.com
gimi9.com
Updated Jul 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). The Groves - Low Traffic Neighbourhood Trial | gimi9.com [Dataset]. https://gimi9.com/dataset/uk_the-groves-low-traffic-neighbourhood-trial/
Explore at:
Dataset updated
Jul 27, 2021
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
In June 2020, the decision was taken to implement a low traffic neighbourhood trial in The Groves. For more information on the trial please visit City of York Council's website Independent monitoring and evaluation work has been commissioned by CYC to assess the impact of the trial and inform future decisions on the experimental road closures in The Groves. Part of this work uses traffic surveys which are available in this dataset which includes baseline surveys for: • the week before the start of the trial (week 1) • and the first two weeks of the trial (weeks 2 and 3). • Approx. A year after the start of the trial (included in The Groves Traffic Analysis). • Bus journey time data before and during the trial (The Groves Bus Analysis)
Weekly road traffic collision (AXA Mexico) & weekly road traffic deaths
data.niaid.nih.gov
datadryad.org
zip
Updated Aug 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carolina Pérez Ferrer (2023). Weekly road traffic collision (AXA Mexico) & weekly road traffic deaths [Dataset]. http://doi.org/10.5061/dryad.dfn2z3540
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.dfn2z3540
Dataset updated
Aug 21, 2023
Dataset provided by
Instituto Nacional de Salud Pública
Authors
Carolina Pérez Ferrer
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Mexico
Description
Dataset 1 (AXA collisions 2015–2019) was curated and used to evaluate the effect of two road traffic regulations implemented in Mexico City in 2015 and 2019 on collisions using an interrupted time series analysis. Collisions data came from insurance collision claims (January 2015 to December 2019). The dataset contains 8 variables: year (anio_n), week (semana), count of total collisions per week (c_total), count of collisions resulting in injury per week (c_p_lesion), binary variable to identify the 2015 intervention (limit), binary variable to identify the 2019 intervention (limit1), the number of weeks from baseline (time), an estimate of the number of insured vehicles per week (veh_a_cdmx). Dataset 2 (Road traffic deaths 2013–2019) was curated and used to evaluate the effect of two road traffic regulations implemented in Mexico City in 2015 and 2019 on mortality using an interrupted time series analysis. Mortality data came from vital registries collated by the Mexican Institute for Geography and Statistics, INEGI, (January 2013 to December 2019). The dataset contains 7 variables: year (anio_ocur), week (semana), count of traffic-related deaths per week (def_trans), binary variable to identify the 2015 intervention (limit), binary variable to identify the 2019 intervention (limit1), the number of weeks from baseline (time) and an estimate of the Mexico City population per week (pob_tot_p). Methods Dataset 1 arises from publicly available data on insurance-reported collisions published on the website of the International Institute for Data Science (see reference below). The data were collected by claims adjusters from the company AXA at the site of the collision using an electronic device. These data were available for public use from January 2015 to December 2019 and include information on individual collisions and their characteristics: date the collision occurred, location (coordinates and adjuster reported location), type of vehicle involved and whether there were injuries or deaths. Data were processed and cleaned, mapping collisions, and keeping only those georeferenced within Mexico City boundaries as well as coded to Mexico City in the reported location variable. We then summed the number of collisions per week and merged it with data on an estimate of the number of insured registered vehicles per week (using information from registered vehicles and proportion of insured vehicles from the Mexican Association of Insurance companies). Two more variables were created, one that identifies the week when the intervention came into effect and another variable to number the weeks since baseline. This dataset contains all the necessary information to conduct the interrupted time series analysis for total collisions and collisions resulting in injuries. Dataset 2: mortality data were validated and reported by INEGI (see reference below) from death certificates filed mainly by the Health Sector, using the International Classification of Disease, 10th Revision (ICD-10) for diagnosis codes. We used data from January 2013 to December 2019 and included deaths with the following ICD-10 codes: V02-V04 (.1-.9), V09, V092, V09.3, V09.9, V12-V14 (.3-.9), V19.4-V19.6, V19.9, V20-V28 (.3-.9), V29, V30-V39, V40-V79 (.4-.9), V80.3-V80.5, V81.1, V82.1, V82.1, V83-V86 (.0-.3), V87-V89.2 and V89.9. We summed the number of traffic-related deaths per week and merged it with data on an estimate of the total population in Mexico City per week (see refs below). Two more variables were created, one that identifies the week when the intervention came into effect and another variable to number the weeks since baseline. This dataset contains all the necessary information to conduct the interrupted time series analysis for road traffic deaths. References to original data:

Instituto Internacional de Ciencia de Datos. Datos AXA de Percances Viales [Internet]. 2020 [July 2021]. Available from: https://i2ds.org/datos-abiertos/. Instituto Nacional de Geografía y Estadística. Parque Vehicular [Internet]. 2019 [July 2021]. Available from: https://www.inegi.org.mx/temas/vehiculos/default.html#Tabulados. Dirección Ejecutiva de Líneas de Negocio área de Automóviles. Sistema Estadístico del Sector Asegurador del ramo Automóviles SESA 2018. Mexico City: Asociación Mexicana de Instituciones de Seguro, 2020. Instituto Nacional de Geografía y Estadística. Mortalidad [Internet]. 2020 [July 2021]. Available from: https://www.inegi.org.mx/programas/mortalidad/default.html#Datos_abiertos.

World Health Organisation. ICD-10 Version:2010 [Internet]. 2010 [July 2021]. Available from: https://icd.who.int/browse10/2010/en. Consejo Nacional de Población. Proyecciones de la Población de México y de las Entidades Federativas, 2016-2050 [Internet]. 2018 [July 2021]. Available from: https://datos.gob.mx/busca/dataset/proyecciones-de-la-poblacion-de-mexico-y-de-las-entidades-federativas-2016-2050.
D
SEO Testing Service Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). SEO Testing Service Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-seo-testing-service-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Sep 22, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
SEO Testing Service Market Outlook

The global SEO Testing Service market size was valued at approximately USD 1.5 billion in 2023 and is projected to reach USD 4.3 billion by 2032, growing at a CAGR of 12.5% during the forecast period. The growth of this market can be attributed to the increasing importance of digital marketing and the need for businesses to optimize their online presence to stay competitive.

The primary growth factor driving the SEO Testing Service market is the rapid digital transformation across various industries. As more businesses shift their operations online, the need to ensure that their websites perform well in search engine rankings becomes crucial. This has led to a surge in demand for SEO testing services that can help businesses identify and rectify issues affecting their search engine performance. Additionally, the increasing complexity of search engine algorithms necessitates regular testing and optimization, further fueling market growth.

Another significant growth factor is the rise of e-commerce and online retail. With the proliferation of online shopping, businesses are keen on enhancing their visibility on search engines to attract more customers. SEO testing services play a vital role in this by helping e-commerce platforms optimize their websites for better search engine rankings, thereby driving traffic and sales. Moreover, the growth of mobile internet usage has also spurred the demand for mobile SEO testing services, as businesses aim to ensure that their websites are mobile-friendly and perform well on mobile search engines.

The growing awareness about the importance of local SEO is also contributing to the market's expansion. As businesses, particularly small and medium enterprises (SMEs), recognize the value of appearing in local search results, they are increasingly investing in local SEO testing services. This trend is especially prominent in industries like retail, healthcare, and hospitality, where local search visibility can significantly impact customer acquisition and revenue generation.

From a regional perspective, North America is expected to hold the largest market share during the forecast period, driven by the high adoption of digital marketing strategies and the presence of numerous SEO service providers in the region. The Asia Pacific region is anticipated to witness the highest growth rate, fueled by the rapid digitalization of businesses in countries like China and India. Europe, Latin America, and the Middle East & Africa are also expected to experience steady growth, driven by increasing investments in digital marketing and the growing awareness of the benefits of SEO testing services.

Service Type Analysis

On-Page SEO Testing services are pivotal in scrutinizing the elements on a webpage that impact search engine rankings. These services evaluate factors such as content quality, keyword usage, meta tags, images, and internal links. The growing complexity of search engine algorithms has made on-page optimization more critical than ever. Businesses are increasingly leveraging these services to ensure that their webpages meet the latest SEO standards and guidelines. The demand for on-page SEO testing is particularly high among content-driven websites and blogs, where content quality and relevance are paramount for search engine visibility.

Off-Page SEO Testing services focus on external factors that influence a website's search engine rankings, such as backlinks, social signals, and online reputation. These services help businesses identify and build high-quality backlinks, which are crucial for improving search engine rankings. With search engines placing more emphasis on the credibility and authority of websites, off-page SEO testing has become an essential tool for businesses looking to enhance their online presence. The demand for these services is robust among companies engaged in competitive industries, where building a strong backlink profile can provide a significant advantage.

Technical SEO Testing services are designed to identify and resolve technical issues that may hinder a website's search engine performance. These services cover aspects such as website speed, mobile-friendliness, structured data, and crawlability. With search engines like Google prioritizing user experience, technical SEO testing has become indispensable for businesses aiming to optimize their websites for better performance. The increasing complexity of website architectures and the growing emphasis on page speed and mobile optimization are driving the demand f
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Homan, Sophia (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410

Network Traffic Analysis: Data and Code

Explore at:

Dataset updated

Jun 12, 2024

Dataset provided by

Ferrell, Nathan
Homan, Sophia
Moran, Madeline
Soni, Shreena
Chan-Tin, Eric
Honig, Joshua

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

Clear search

Close search

Google apps

Main menu

Network Traffic Analysis: Data and Code

Competitor Analysis Evaluation Report

Data from: Annual Average Daily Traffic

Competitor Analysis Evaluation Report

Most common content marketing effectiveness evaluation methods in Russia...

Anonymized Two-Way Traffic Packet Header Traces 100G (5 sec) sampler

Data from: Improving the efficacy of web-based educational outreach in...

Online ad effectiveness evaluation indicators in Russia 2021, by type

Truck Average Daily Traffic

GTT23: A 2023 Dataset of Genuine Tor Traces

The Groves - Low Traffic Neighbourhood Trial | gimi9.com

Weekly road traffic collision (AXA Mexico) & weekly road traffic deaths

SEO Testing Service Market Report | Global Forecast From 2025 To 2033

SEO Testing Service Market Outlook

Service Type Analysis

Network Traffic Analysis: Data and CodeSee More Versions

Network Traffic Analysis: Data and Code