59 datasets found
  1. Z

    Network Traffic Analysis: Data and Code

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Loyola University Chicago
    Authors
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code:

    Packet_Features_Generator.py & Features.py

    To run this code:

    pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

    -h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

    Purpose:

    Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

    Uses Features.py to calcualte the features.

    startMachineLearning.sh & machineLearning.py

    To run this code:

    bash startMachineLearning.sh

    This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

    Options (to be edited within this file):

    --evaluate-only to test 5 fold cross validation accuracy

    --test-scaling-normalization to test 6 different combinations of scalers and normalizers

    Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

    --grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

    Purpose:

    Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

    Data

    Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

    Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

    First number is a classification number to denote what website, query, or vr action is taking place.

    The remaining numbers in each line denote:

    The size of a packet,

    and the direction it is traveling.

    negative numbers denote incoming packets

    positive numbers denote outgoing packets

    Figure 4 Data

    This data uses specific lines from the Virtual Reality.txt file.

    The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

    The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

    The .xlsx and .csv file are identical

    Each file includes (from right to left):

    The origional packet data,

    each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

    and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

  2. Leading K12 and test preparation platforms in India 2022, by website traffic...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Leading K12 and test preparation platforms in India 2022, by website traffic [Dataset]. https://www.statista.com/statistics/1413860/india-k12-and-test-preparation-platforms-by-website-traffic/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2022 - Sep 2022
    Area covered
    India
    Description

    Between July and September 2022, BYJU's emerged as the top Ed Tech platform for K12 and test preparation In India. It recorded approximately *** million website visits. Following closely behind was Toppr.com, with around *** million visits during the same period.

  3. Leading website traffic in Kenya 2021, by device

    • statista.com
    Updated Feb 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Leading website traffic in Kenya 2021, by device [Dataset]. https://www.statista.com/statistics/1316963/web-traffic-distribution-of-leading-websites-in-kenya-by-device/
    Explore at:
    Dataset updated
    Feb 15, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2021
    Area covered
    Kenya
    Description

    In November 2021, mobile devices accounted for nearly ** percent of the web traffic to Google.com in Kenya. The website had the highest number of total visits in the country. Among the leading websites, most of them had a higher share of traffic from mobile. Youtube.com was an exception, with only ********* of its traffic originating from mobile devices.

  4. tester.ma Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). tester.ma Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/tester.ma/overview/
    Explore at:
    Dataset updated
    Nov 11, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 11, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    tester.ma is ranked #25269 in DZ with 10.45K Traffic. Categories: . Learn more about website traffic, market share, and more!

  5. Share of global mobile website traffic 2015-2025

    • statista.com
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of global mobile website traffic 2015-2025 [Dataset]. https://www.statista.com/statistics/277125/share-of-website-traffic-coming-from-mobile-devices/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the second quarter of 2025, mobile devices (excluding tablets) accounted for 62.54 percent of global website traffic. Since consistently maintaining a share of around 50 percent beginning in 2017, mobile usage surpassed this threshold in 2020 and has demonstrated steady growth in its dominance of global web access. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.

  6. Recipe Site Traffic: Analysis & Prediction

    • kaggle.com
    Updated Sep 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Matta (2025). Recipe Site Traffic: Analysis & Prediction [Dataset]. https://www.kaggle.com/datasets/michaelmatta0/recipe-site-traffic-analysis-and-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2025
    Dataset provided by
    Kaggle
    Authors
    Michael Matta
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset originates from DataCamp. Many users have reposted copies of the CSV on Kaggle, but most of those uploads omit the original instructions, business context, and problem framing. In this upload, I’ve included that missing context in the About Dataset so the reader of my notebook or any other notebook can fully understand how the data was intended to be used and the intended problem framing.

    Note: I have also uploaded a visualization of the workflow I personally took to tackle this problem, but it is not part of the dataset itself. Additionally, I created a PowerPoint presentation based on my work in the notebook, which you can download from here:
    PPTX Presentation

    Recipe Site Traffic

    From: Head of Data Science
    Received: Today
    Subject: New project from the product team

    Hey!

    I have a new project for you from the product team. Should be an interesting challenge. You can see the background and request in the email below.

    I would like you to perform the analysis and write a short report for me. I want to be able to review your code as well as read your thought process for each step. I also want you to prepare and deliver the presentation for the product team - you are ready for the challenge!

    They want us to predict which recipes will be popular 80% of the time and minimize the chance of showing unpopular recipes. I don't think that is realistic in the time we have, but do your best and present whatever you find.

    You can find more details about what I expect you to do here. And information on the data here.

    I will be on vacation for the next couple of weeks, but I know you can do this without my support. If you need to make any decisions, include them in your work and I will review them when I am back.

    Good Luck!

    From: Product Manager - Recipe Discovery
    To: Head of Data Science
    Received: Yesterday
    Subject: Can you help us predict popular recipes?

    Hi,

    We haven't met before but I am responsible for choosing which recipes to display on the homepage each day. I have heard about what the data science team is capable of and I was wondering if you can help me choose which recipes we should display on the home page?

    At the moment, I choose my favorite recipe from a selection and display that on the home page. We have noticed that traffic to the rest of the website goes up by as much as 40% if I pick a popular recipe. But I don't know how to decide if a recipe will be popular. More traffic means more subscriptions so this is really important to the company.

    Can your team: - Predict which recipes will lead to high traffic? - Correctly predict high traffic recipes 80% of the time?

    We need to make a decision on this soon, so I need you to present your results to me by the end of the month. Whatever your results, what do you recommend we do next?

    Look forward to seeing your presentation.

    About Tasty Bytes

    Tasty Bytes was founded in 2020 in the midst of the Covid Pandemic. The world wanted inspiration so we decided to provide it. We started life as a search engine for recipes, helping people to find ways to use up the limited supplies they had at home.

    Now, over two years on, we are a fully fledged business. For a monthly subscription we will put together a full meal plan to ensure you and your family are getting a healthy, balanced diet whatever your budget. Subscribe to our premium plan and we will also deliver the ingredients to your door.

    Example Recipe

    This is an example of how a recipe may appear on the website, we haven't included all of the steps but you should get an idea of what visitors to the site see.

    Tomato Soup

    Servings: 4
    Time to make: 2 hours
    Category: Lunch/Snack
    Cost per serving: $

    Nutritional Information (per serving) - Calories 123 - Carbohydrate 13g - Sugar 1g - Protein 4g

    Ingredients: - Tomatoes - Onion - Carrot - Vegetable Stock

    Method: 1. Cut the tomatoes into quarters….

    Data Information

    The product manager has tried to make this easier for us and provided data for each recipe, as well as whether there was high traffic when the recipe was featured on the home page.

    As you will see, they haven't given us all of the information they have about each recipe.

    You can find the data here.

    I will let you decide how to process it, just make sure you include all your decisions in your report.

    Don't forget to double check the data really does match what they say - it might not.

    Column NameDetails
    recipeNumeric, unique identifier of recipe
    caloriesNumeric, number of calories
    carbohydrateNumeric, amount of carbohydrates in grams
    sugarNumeric, amount of sugar in grams
    proteinNumeric, amount of prote...
  7. email-checker.net Website Traffic, Ranking, Analytics [September 2025]

    • semrush.ebundletools.com
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). email-checker.net Website Traffic, Ranking, Analytics [September 2025] [Dataset]. https://semrush.ebundletools.com/website/email-checker.net/overview/
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    email-checker.net is ranked #50125 in US with 641.86K Traffic. Categories: Computer Software and Development, Information Technology, Online Services. Learn more about website traffic, market share, and more!

  8. Monthly website traffic on nykaa.com 2024

    • statista.com
    Updated Feb 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Monthly website traffic on nykaa.com 2024 [Dataset]. https://www.statista.com/statistics/1242055/nykaa-website-traffic/
    Explore at:
    Dataset updated
    Feb 15, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Aug 2023 - Jan 2024
    Area covered
    India
    Description

    In the month of January 2024, the beauty and personal care retailer Nykaa had about **** million website visits. In comparison, the month of December in 2023 clocked over ten million monthly website visits.

  9. iq-checker.xyz Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). iq-checker.xyz Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/iq-checker.xyz/overview/
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    iq-checker.xyz is ranked #15394 in TR with 63.51K Traffic. Categories: . Learn more about website traffic, market share, and more!

  10. e

    mail-tester.com Traffic Analytics Data

    • analytics.explodingtopics.com
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). mail-tester.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/mail-tester.com
    Explore at:
    Dataset updated
    Aug 1, 2025
    Variables measured
    Global Rank, Monthly Visits, Authority Score, US Country Rank
    Description

    Traffic analytics, rankings, and competitive metrics for mail-tester.com as of August 2025

  11. e

    av-test.org Traffic Analytics Data

    • analytics.explodingtopics.com
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). av-test.org Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/av-test.org
    Explore at:
    Dataset updated
    Sep 1, 2025
    Variables measured
    Global Rank, Monthly Visits, Authority Score, US Country Rank
    Description

    Traffic analytics, rankings, and competitive metrics for av-test.org as of September 2025

  12. s

    Data from: Traffic Volumes

    • data.sandiego.gov
    Updated Jul 29, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Traffic Volumes [Dataset]. https://data.sandiego.gov/datasets/traffic-volumes/
    Explore at:
    csv csv is tabular data. excel, google docs, libreoffice calc or any plain text editor will open files with this format. learn moreAvailable download formats
    Dataset updated
    Jul 29, 2016
    Description

    The census count of vehicles on city streets is normally reported in the form of Average Daily Traffic (ADT) counts. These counts provide a good estimate for the actual number of vehicles on an average weekday at select street segments. Specific block segments are selected for a count because they are deemed as representative of a larger segment on the same roadway. ADT counts are used by transportation engineers, economists, real estate agents, planners, and others professionals for planning and operational analysis. The frequency for each count varies depending on City staff’s needs for analysis in any given area. This report covers the counts taken in our City during the past 12 years approximately.

  13. test-velocidad.com Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). test-velocidad.com Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/test-velocidad.com/overview/
    Explore at:
    Dataset updated
    Nov 11, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 11, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    test-velocidad.com is ranked #27469 in ES with 62.33K Traffic. Categories: Information Technology, Telecom. Learn more about website traffic, market share, and more!

  14. e

    test-ipv6.com Traffic Analytics Data

    • analytics.explodingtopics.com
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). test-ipv6.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/test-ipv6.com
    Explore at:
    Dataset updated
    Sep 1, 2025
    Variables measured
    Global Rank, Monthly Visits, Authority Score, US Country Rank
    Description

    Traffic analytics, rankings, and competitive metrics for test-ipv6.com as of September 2025

  15. d

    Chicago Traffic Tracker - Congestion Estimates by Segments

    • catalog.data.gov
    • data.cityofchicago.org
    • +4more
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofchicago.org (2025). Chicago Traffic Tracker - Congestion Estimates by Segments [Dataset]. https://catalog.data.gov/dataset/chicago-traffic-tracker-congestion-estimates-by-segments
    Explore at:
    Dataset updated
    Nov 29, 2025
    Dataset provided by
    data.cityofchicago.org
    Area covered
    Chicago
    Description

    This dataset contains the current estimated speed for about 1250 segments covering 300 miles of arterial roads. For a more detailed description, please go to https://tas.chicago.gov, click the About button at the bottom of the page, and then the MAP LAYERS tab. The Chicago Traffic Tracker estimates traffic congestion on Chicago’s arterial streets (nonfreeway streets) in real-time by continuously monitoring and analyzing GPS traces received from Chicago Transit Authority (CTA) buses. Two types of congestion estimates are produced every ten minutes: 1) by Traffic Segments and 2) by Traffic Regions or Zones. Congestion estimate by traffic segments gives the observed speed typically for one-half mile of a street in one direction of traffic. Traffic Segment level congestion is available for about 300 miles of principal arterials. Congestion by Traffic Region gives the average traffic condition for all arterial street segments within a region. A traffic region is comprised of two or three community areas with comparable traffic patterns. 29 regions are created to cover the entire city (except O’Hare airport area). This dataset contains the current estimated speed for about 1250 segments covering 300 miles of arterial roads. There is much volatility in traffic segment speed. However, the congestion estimates for the traffic regions remain consistent for relatively longer period. Most volatility in arterial speed comes from the very nature of the arterials themselves. Due to a myriad of factors, including but not limited to frequent intersections, traffic signals, transit movements, availability of alternative routes, crashes, short length of the segments, etc. speed on individual arterial segments can fluctuate from heavily congested to no congestion and back in a few minutes. The segment speed and traffic region congestion estimates together may give a better understanding of the actual traffic conditions.

  16. Total global visitor traffic to amazon.com 2024

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Total global visitor traffic to amazon.com 2024 [Dataset]. https://www.statista.com/statistics/623566/web-visits-to-amazoncom/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2023 - Mar 2024
    Area covered
    Worldwide
    Description

    In March 2024, Amazon.com had approximately 2.2 billion combined web visits, up from 2.1 billion visits in February. In the fourth quarter of 2024, Amazon’s net income amounted to approximately 20 billion U.S. dollars. Online retail in the United States Online retail in the United States is constantly growing. In the third quarter of 2023, e-commerce sales accounted for 15.6 percent of retail sales in the United States. During that quarter, U.S. retail e-commerce sales amounted to over 284 billion U.S. dollars. Amazon is the leading online store in the country, in terms of e-commerce net sales. Amazon.com generated around 130 billion U.S. dollars in online sales in 2022. Walmart ranked as the second-biggest online store, with revenues of 52 billion U.S. dollars. The king of Black Friday In 2023, Amazon ranked as U.S. shoppers' favorite place to go shopping during Black Friday, even surpassing in-store purchasing. Nearly six out of ten consumers chose Amazon as the number one place to go find the best Black Friday deals. Similar findings can be observed in the United Kingdom (UK), where Amazon is also ranked as the preferred Black Friday destination.

  17. gpc-check.com Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). gpc-check.com Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/gpc-check.com/overview/
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    gpc-check.com is ranked #5072 in JP with 608.47K Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!

  18. pNEUMA dataset

    • zenodo.org
    html, zip
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emmanouil Barmpounakis; Emmanouil Barmpounakis; Nikolas Geroliminis; Nikolas Geroliminis (2024). pNEUMA dataset [Dataset]. http://doi.org/10.5281/zenodo.10491409
    Explore at:
    zip, htmlAvailable download formats
    Dataset updated
    Jan 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Emmanouil Barmpounakis; Emmanouil Barmpounakis; Nikolas Geroliminis; Nikolas Geroliminis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    pNEUMA is an open large-scale dataset of naturalistic trajectories of half a million vehicles that have been collected by a one-of-a-kind experiment by a swarm of drones in the congested downtown area of Athens, Greece. A unique observatory of traffic congestion, a scale an-order-of-magnitude higher than what was not available until now, that researchers from different disciplines around the globe can use to develop and test their own models.

    How are the .csv files organized?

    For each .csv file the following apply:
    • each row represents the data of a single vehicle
    • the first 10 columns in the 1st row include the columns’ names
    • the first 4 columns include information about the trajectory like the unique trackID, the type of vehicle, the distance traveled in meters and the average speed of the vehicle in km/h
    • the last 6 columns are then repeated every 6 columns based on the time frequency. For example, column_5 contains the latitude of the vehicle at time column_10, and column_11 contains the latitude of the vehicle at time column_16.
    • Speed is in km/h, Longitudinal and Lateral Acceleration in m/sec2 and time in seconds.

    For more details about the pNEUMA dataset, please check our website at https://open-traffic.epfl.ch

  19. M

    Annual Average Daily Traffic Locations in Minnesota

    • gisdata.mn.gov
    fgdb, gpkg, html +3
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Transportation Department (2025). Annual Average Daily Traffic Locations in Minnesota [Dataset]. https://gisdata.mn.gov/dataset/trans-aadt-traffic-count-locs
    Explore at:
    shp, html, webapp, gpkg, jpeg, fgdbAvailable download formats
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    Transportation Department
    Area covered
    Minnesota
    Description

    AADT represents current (most recent) Annual Average Daily Traffic on sampled road systems. This information is displayed using the Traffic Count Locations Active feature class as of the annual HPMS freeze in January. Historical AADT is found in another table. Please note that updates to this dataset are on an annual basis, therefore the data may not match ground conditions or may not be available for new roadways. Resource Contact: Christy Prentice, Traffic Forecasting & Analysis (TFA), http://www.dot.state.mn.us/tda/contacts.html#TFA

    Check other metadata records in this package for more information on Annual Average Daily Traffic Locations Information.


    Link to ESRI Feature Service:

    Annual Average Daily Traffic Locations in Minnesota: Annual Average Daily Traffic Locations


  20. Social Media Engagement Report

    • kaggle.com
    zip
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Reda Elblgihy (2024). Social Media Engagement Report [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/social-media-engagement-report
    Explore at:
    zip(49114657 bytes)Available download formats
    Dataset updated
    Apr 13, 2024
    Authors
    Ali Reda Elblgihy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    *****Documentation Process***** 1. Data Preparation: - Upload the data into Power Query to assess quality and identify duplicate values, if any. - Verify data quality and types for each column, addressing any miswriting or inconsistencies. 2. Data Management: - Duplicate the original data sheet for future reference and label the new sheet as the "Working File" to preserve the integrity of the original dataset. 3. Understanding Metrics: - Clarify the meaning of column headers, particularly distinguishing between Impressions and Reach, and comprehend how Engagement Rate is calculated. - Engagement Rate formula: Total likes, comments, and shares divided by Reach. 4. Data Integrity Assurance: - Recognize that Impressions should outnumber Reach, reflecting total views versus unique audience size. - Investigate discrepancies between Reach and Impressions to ensure data integrity, identifying and resolving root causes for accurate reporting and analysis. 5. Data Correction: - Collaborate with the relevant team to rectify data inaccuracies, specifically addressing the discrepancy between Impressions and Reach. - Engage with the concerned team to understand the root cause of discrepancies between Impressions and Reach. - Identify instances where Impressions surpass Reach, potentially attributable to data transformation errors. - Following the rectification process, meticulously adjust the dataset to reflect the corrected Impressions and Reach values accurately. - Ensure diligent implementation of the corrections to maintain the integrity and reliability of the data. - Conduct a thorough recalculation of the Engagement Rate post-correction, adhering to rigorous data integrity standards to uphold the credibility of the analysis. 6. Data Enhancement: - Categorize Audience Age into three groups: "Senior Adults" (45+ years), "Mature Adults" (31-45 years), and "Adolescent Adults" (<30 years) within a new column named "Age Group." - Split date and time into separate columns using the text-to-columns option for improved analysis. 7. Temporal Analysis: - Introduce a new column for "Weekend and Weekday," renamed as "Weekday Type," to discern patterns and trends in engagement. - Define time periods by categorizing into "Morning," "Afternoon," "Evening," and "Night" based on time intervals. 8. Sentiment Analysis: - Populate blank cells in the Sentiment column with "Mixed Sentiment," denoting content containing both positive and negative sentiments or ambiguity. 9. Geographical Analysis: - Group countries and obtain additional continent data from an online source (e.g., https://statisticstimes.com/geography/countries-by-continents.php). - Add a new column for "Audience Continent" and utilize XLOOKUP function to retrieve corresponding continent data.

    *****Drawing Conclusions and Providing a Summary*****

    • The data is equally distributed across different categories, platforms, and over the years.
    • Most of our audience comprises senior adults (aged 45 and above).
    • Most of our audience exhibit mixed sentiments about our posts. However, an equal portion expresses consistent sentiments.
    • The majority of our posts were located in Africa.
    • The number of posts increased from the first year to the second year and remained relatively consistent for the third year.
    • The optimal time for posting is during the night on weekdays.
    • The highest engagement rates were observed in Croatia then Malawi.
    • The number of posts targeting senior adults is significantly higher than the other two categories. However, the engagement rates for mature and adolescent adults are also noteworthy, based on the number of targeted posts.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410

Network Traffic Analysis: Data and Code

Explore at:
Dataset updated
Jun 12, 2024
Dataset provided by
Loyola University Chicago
Authors
Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

Search
Clear search
Close search
Google apps
Main menu