100+ datasets found
  1. e

    similarweb.com Traffic Analytics Data

    • analytics.explodingtopics.com
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). similarweb.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/similarweb.com
    Explore at:
    Dataset updated
    Aug 1, 2025
    Variables measured
    Global Rank, Monthly Visits, Authority Score, US Country Rank, Online Services Category Rank
    Description

    Traffic analytics, rankings, and competitive metrics for similarweb.com as of August 2025

  2. d

    Search Engine Comparison Data

    • dorik.com
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zaki Rezwana Chowdhury (2024). Search Engine Comparison Data [Dataset]. https://dorik.com/blog/alternative-search-engines
    Explore at:
    Dataset updated
    Sep 9, 2024
    Authors
    Zaki Rezwana Chowdhury
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Geo-Blocking, Search Quality, Privacy Protection, Additional Features, Environmental Impact, Ads/Sponsored Results, Censorship of Information
    Description

    Feature comparison matrix of Google alternative search engines

  3. w

    Websites using Same But Different

    • webtechsurvey.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey, Websites using Same But Different [Dataset]. https://webtechsurvey.com/technology/same-but-different
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Same But Different technology, compiled through global website indexing conducted by WebTechSurvey.

  4. Curlie Enhanced with LLM Annotations: Two Datasets for Advancing...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Nutter; Mika Senghaas; Ludek Cizinsky; Peter Nutter; Mika Senghaas; Ludek Cizinsky (2023). Curlie Enhanced with LLM Annotations: Two Datasets for Advancing Homepage2Vec's Multilingual Website Classification [Dataset]. http://doi.org/10.5281/zenodo.10413068
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Peter Nutter; Mika Senghaas; Ludek Cizinsky; Peter Nutter; Mika Senghaas; Ludek Cizinsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Advancing Homepage2Vec with LLM-Generated Datasets for Multilingual Website Classification

    This dataset contains two subsets of labeled website data, specifically created to enhance the performance of Homepage2Vec, a multi-label model for website classification. The datasets were generated using Large Language Models (LLMs) to provide more accurate and diverse topic annotations for websites, addressing a limitation of existing Homepage2Vec training data.

    Key Features:

    • LLM-generated annotations: Both datasets feature website topic labels generated using LLMs, a novel approach to creating high-quality training data for website classification models.
    • Improved multi-label classification: Fine-tuning Homepage2Vec with these datasets has been shown to improve its macro F1 score from 38% to 43% evaluated on a human-labeled dataset, demonstrating their effectiveness in capturing a broader range of website topics.
    • Multilingual applicability: The datasets facilitate classification of websites in multiple languages, reflecting the inherent multilingual nature of Homepage2Vec.

    Dataset Composition:

    • curlie-gpt3.5-10k: 10,000 websites labeled using GPT-3.5, context 2 and 1-shot
    • curlie-gpt4-10k: 10,000 websites labeled using GPT-4, context 2 and zero-shot

    Intended Use:

    • Fine-tuning and advancing Homepage2Vec or similar website classification models
    • Research on LLM-generated datasets for text classification tasks
    • Exploration of multilingual website classification

    Additional Information:

    Acknowledgments:

    This dataset was created as part of a project at EPFL's Data Science Lab (DLab) in collaboration with Prof. Robert West and Tiziano Piccardi.

  5. Fraudulent Bank Websites, Phishing E-mails and Similar Scams | DATA.GOV.HK

    • data.gov.hk
    Updated Dec 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.hk (2018). Fraudulent Bank Websites, Phishing E-mails and Similar Scams | DATA.GOV.HK [Dataset]. https://data.gov.hk/en-data/dataset/hk-hkma-banksvf-fraudulent-bank-scams
    Explore at:
    Dataset updated
    Dec 14, 2018
    Dataset provided by
    data.gov.hk
    Description

    This API is providing the information of press releases issued by the authorized institutions and other similar press releases issued by the HKMA in the past regarding fraudulent bank websites, phishing E-mails and similar scams information.

  6. C

    Core Web Vitals vs Bing Performance Metrics: Understanding Different Page...

    • caseysseo.com
    txt
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey Miller (2025). Core Web Vitals vs Bing Performance Metrics: Understanding Different Page Experience Signals [Dataset]. https://caseysseo.com/core-web-vitals-vs-bing-performance-metrics-understanding-different-page-experience-signals
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 21, 2025
    Dataset provided by
    Casey's SEO
    Authors
    Casey Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    Colorado Springs, Colorado
    Variables measured
    Bing Page Size, Bing Page Load Time, First Input Delay (FID), Google Top 10 LCP Scores, Bing Server Response Time, Cumulative Layout Shift (CLS), Largest Contentful Paint (LCP), Colorado Springs Mobile Search Growth
    Measurement technique
    Synthetic testing and simulated visits used by Bing, Analysis of search engine ranking data and industry benchmarks, Real User Monitoring (RUM) data from Google Chrome users
    Description

    This dataset explores the differences between Google's Core Web Vitals and Bing's performance metrics, providing insights into how these different page experience signals impact search engine rankings. It covers the key metrics measured by each search engine, their approaches to performance evaluation, and the real-world impact on local business rankings. The data includes analysis of measurement techniques, industry benchmarks, and the evolving role of page speed in search algorithms.

  7. Watching paid content on websites like Netflix and HBO in Norway 2009-2020

    • statista.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Watching paid content on websites like Netflix and HBO in Norway 2009-2020 [Dataset]. https://www.statista.com/statistics/981176/watching-paid-content-on-websites-like-netflix-and-hbo-in-norway/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Norway
    Description

    The share of individuals watching paid content on websites like Netflix and HBO in Norway generally increased from 2009 to 2020. In 2009, the share amounted to three percent of respondents, whereas in 2020 it reached ** percent.

  8. Z

    Traffic Acquisition to LAMs Websites

    • data.niaid.nih.gov
    Updated Apr 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitrios Kouis (2022). Traffic Acquisition to LAMs Websites [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6505276
    Explore at:
    Dataset updated
    Apr 30, 2022
    Dataset provided by
    Dimitrios Kouis
    Ioannis C. Drivas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preliminary research efforts regarding Social Media Platforms and their contribution to website traffic in LAMs. Through the Similar Web API, the leading social networks (Facebook, Twitter, Youtube, Instagram, Reddit, Pinterest, LinkedIn) that drove traffic to each one of the 220 cases in our dataset were identified and analyzed in the first sheet. Aggregated results proved that Facebook platform was responsible for 46.1% of social traffic (second sheet).

  9. w

    Websites using Like Post

    • webtechsurvey.com
    csv
    Updated Oct 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using Like Post [Dataset]. https://webtechsurvey.com/technology/like-post
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 12, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Like Post technology, compiled through global website indexing conducted by WebTechSurvey.

  10. built-different.co Website Traffic, Ranking, Analytics [August 2025]

    • semrush.ebundletools.com
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). built-different.co Website Traffic, Ranking, Analytics [August 2025] [Dataset]. https://semrush.ebundletools.com/website/built-different.co/overview/
    Explore at:
    Dataset updated
    Sep 16, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Sep 16, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    built-different.co is ranked #13815 in GB with 201.22K Traffic. Categories: . Learn more about website traffic, market share, and more!

  11. i

    A Dataset on Online Learning-based Web Behavior from Different Countries...

    • ieee-dataport.org
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saumick Pradhan (2025). A Dataset on Online Learning-based Web Behavior from Different Countries Before and After COVID-19 [Dataset]. https://ieee-dataport.org/open-access/dataset-online-learning-based-web-behavior-different-countries-and-after-covid-19
    Explore at:
    Dataset updated
    Jul 29, 2025
    Authors
    Saumick Pradhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    2022

  12. F

    All Employees, Museums, Historical Sites, and Similar Institutions

    • fred.stlouisfed.org
    json
    Updated Sep 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). All Employees, Museums, Historical Sites, and Similar Institutions [Dataset]. https://fred.stlouisfed.org/series/CEU7071200001
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 5, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Description

    Graph and download economic data for All Employees, Museums, Historical Sites, and Similar Institutions (CEU7071200001) from Jan 1990 to Aug 2025 about museums, amusements, leisure, hospitality, establishment survey, employment, and USA.

  13. Bounce rate of leading consumer electronics sites worldwide 2025

    • statista.com
    • tokrwards.com
    Updated Sep 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Bounce rate of leading consumer electronics sites worldwide 2025 [Dataset]. https://www.statista.com/statistics/1325859/consumer-electronics-websites-bounce-rate-worldwide/
    Explore at:
    Dataset updated
    Sep 4, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2025
    Area covered
    Worldwide
    Description

    Among selected consumer electronics retailers worldwide, thegioididong.com recorded the highest bounce rate in July 2025, at approximately ***** percent. apple.com had a slightly lower bounce rate of nearly ***** percent. Among selected consumer electronics e-tailers, sony.com had the lowest bounce rate at ***** percent. Bounce rate is a marketing term used in web traffic analysis reflecting the percentage of visitors who enter the site and then leave without taking any further action, like making a purchase or viewing other pages within the website ("bounce"). A sector with growth potential With one of the lowest online shopping cart abandonment rates globally in 2022, consumer electronics is a burgeoning e-commerce segment that places itself at the crossroads between technological progress and digital transformation. Boosted by the pandemic-induced surge in online shopping, the global market size of consumer electronics e-commerce was estimated at more than *** billion U.S. dollars in 2021 and forecast to nearly double less than five years later. Amazon and Apple lead the charts in electronics e-commerce With more than ** billion U.S. dollars in e-commerce net sales in the consumer electronics segment in 2022, apple.com was the uncontested industry leader. The global powerhouse surpassed e-commerce giants amazon.com and jd.com with more than *** billion U.S. dollars difference in online sales in the consumer electronics category.

  14. T

    United States - Expensed Purchases of Software for Museums, Historical...

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Nov 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2017). United States - Expensed Purchases of Software for Museums, Historical Sites, and Similar Institutions, All Establishments, Employer Firms [Dataset]. https://tradingeconomics.com/united-states/expensed-purchases-of-software-for-museums-historical-sites-and-similar-institutions-all-establishments-employer-firms-fed-data.html
    Explore at:
    json, xml, excel, csvAvailable download formats
    Dataset updated
    Nov 6, 2017
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1976 - Dec 31, 2025
    Area covered
    United States
    Description

    United States - Expensed Purchases of Software for Museums, Historical Sites, and Similar Institutions, All Establishments, Employer Firms was 128.00000 Mil. of $ in January of 2022, according to the United States Federal Reserve. Historically, United States - Expensed Purchases of Software for Museums, Historical Sites, and Similar Institutions, All Establishments, Employer Firms reached a record high of 128.00000 in January of 2022 and a record low of 20.00000 in January of 2005. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Expensed Purchases of Software for Museums, Historical Sites, and Similar Institutions, All Establishments, Employer Firms - last updated from the United States Federal Reserve on September of 2025.

  15. same.new Website Traffic, Ranking, Analytics [September 2025]

    • semrush.ebundletools.com
    Updated Oct 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). same.new Website Traffic, Ranking, Analytics [September 2025] [Dataset]. https://semrush.ebundletools.com/website/same.new/overview/
    Explore at:
    Dataset updated
    Oct 11, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Oct 11, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    same.new is ranked #23109 in IN with 533.98K Traffic. Categories: . Learn more about website traffic, market share, and more!

  16. ScrapeHero Data Cloud - Free and Easy to use

    • datarade.ai
    .json, .csv
    Updated Apr 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scrapehero (2022). ScrapeHero Data Cloud - Free and Easy to use [Dataset]. https://datarade.ai/data-products/scrapehero-data-cloud-free-and-easy-to-use-scrapehero
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Apr 11, 2022
    Dataset provided by
    ScrapeHero
    Authors
    Scrapehero
    Area covered
    Bahamas, Bhutan, Ghana, Dominica, Portugal, Slovakia, Anguilla, Niue, Chad, Bahrain
    Description

    The Easiest Way to Collect Data from the Internet Download anything you see on the internet into spreadsheets within a few clicks using our ready-made web crawlers or a few lines of code using our APIs

    We have made it as simple as possible to collect data from websites

    Easy to Use Crawlers Amazon Product Details and Pricing Scraper Amazon Product Details and Pricing Scraper Get product information, pricing, FBA, best seller rank, and much more from Amazon.

    Google Maps Search Results Google Maps Search Results Get details like place name, phone number, address, website, ratings, and open hours from Google Maps or Google Places search results.

    Twitter Scraper Twitter Scraper Get tweets, Twitter handle, content, number of replies, number of retweets, and more. All you need to provide is a URL to a profile, hashtag, or an advance search URL from Twitter.

    Amazon Product Reviews and Ratings Amazon Product Reviews and Ratings Get customer reviews for any product on Amazon and get details like product name, brand, reviews and ratings, and more from Amazon.

    Google Reviews Scraper Google Reviews Scraper Scrape Google reviews and get details like business or location name, address, review, ratings, and more for business and places.

    Walmart Product Details & Pricing Walmart Product Details & Pricing Get the product name, pricing, number of ratings, reviews, product images, URL other product-related data from Walmart.

    Amazon Search Results Scraper Amazon Search Results Scraper Get product search rank, pricing, availability, best seller rank, and much more from Amazon.

    Amazon Best Sellers Amazon Best Sellers Get the bestseller rank, product name, pricing, number of ratings, rating, product images, and more from any Amazon Bestseller List.

    Google Search Scraper Google Search Scraper Scrape Google search results and get details like search rank, paid and organic results, knowledge graph, related search results, and more.

    Walmart Product Reviews & Ratings Walmart Product Reviews & Ratings Get customer reviews for any product on Walmart.com and get details like product name, brand, reviews, and ratings.

    Scrape Emails and Contact Details Scrape Emails and Contact Details Get emails, addresses, contact numbers, social media links from any website.

    Walmart Search Results Scraper Walmart Search Results Scraper Get Product details such as pricing, availability, reviews, ratings, and more from Walmart search results and categories.

    Glassdoor Job Listings Glassdoor Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Glassdoor.

    Indeed Job Listings Indeed Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Indeed.

    LinkedIn Jobs Scraper Premium LinkedIn Jobs Scraper Scrape job listings on LinkedIn and extract job details such as job title, job description, location, company name, number of reviews, and more.

    Redfin Scraper Premium Redfin Scraper Scrape real estate listings from Redfin. Extract property details such as address, price, mortgage, redfin estimate, broker name and more.

    Yelp Business Details Scraper Yelp Business Details Scraper Scrape business details from Yelp such as phone number, address, website, and more from Yelp search and business details page.

    Zillow Scraper Premium Zillow Scraper Scrape real estate listings from Zillow. Extract property details such as address, price, Broker, broker name and more.

    Amazon product offers and third party sellers Amazon product offers and third party sellers Get product pricing, delivery details, FBA, seller details, and much more from the Amazon offer listing page.

    Realtor Scraper Premium Realtor Scraper Scrape real estate listings from Realtor.com. Extract property details such as Address, Price, Area, Broker and more.

    Target Product Details & Pricing Target Product Details & Pricing Get product details from search results and category pages such as pricing, availability, rating, reviews, and 20+ data points from Target.

    Trulia Scraper Premium Trulia Scraper Scrape real estate listings from Trulia. Extract property details such as Address, Price, Area, Mortgage and more.

    Amazon Customer FAQs Amazon Customer FAQs Get FAQs for any product on Amazon and get details like the question, answer, answered user name, and more.

    Yellow Pages Scraper Yellow Pages Scraper Get details like business name, phone number, address, website, ratings, and more from Yellow Pages search results.

  17. Common languages used for web content 2025, by share of websites

    • statista.com
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  18. oh-like.com Website Traffic, Ranking, Analytics [September 2025]

    • sem2.almunjizun.com
    Updated Oct 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). oh-like.com Website Traffic, Ranking, Analytics [September 2025] [Dataset]. https://sem2.almunjizun.com/website/oh-like.com/overview/
    Explore at:
    Dataset updated
    Oct 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/

    Time period covered
    Oct 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    oh-like.com is ranked #122362 in TH with 1.2K Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!

  19. F

    All Employees: Leisure and Hospitality: Museums, Historical Sites, and...

    • fred.stlouisfed.org
    json
    Updated Sep 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). All Employees: Leisure and Hospitality: Museums, Historical Sites, and Similar Institutions in Illinois [Dataset]. https://fred.stlouisfed.org/series/SMU17000007071200001SA
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 20, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Area covered
    Illinois
    Description

    Graph and download economic data for All Employees: Leisure and Hospitality: Museums, Historical Sites, and Similar Institutions in Illinois (SMU17000007071200001SA) from Jan 1990 to Aug 2025 about museums, leisure, hospitality, IL, employment, and USA.

  20. Z

    Network Traffic Analysis: Data and Code

    • data.niaid.nih.gov
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Loyola University Chicago
    Authors
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code:

    Packet_Features_Generator.py & Features.py

    To run this code:

    pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

    -h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

    Purpose:

    Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

    Uses Features.py to calcualte the features.

    startMachineLearning.sh & machineLearning.py

    To run this code:

    bash startMachineLearning.sh

    This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

    Options (to be edited within this file):

    --evaluate-only to test 5 fold cross validation accuracy

    --test-scaling-normalization to test 6 different combinations of scalers and normalizers

    Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

    --grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

    Purpose:

    Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

    Data

    Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

    Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

    First number is a classification number to denote what website, query, or vr action is taking place.

    The remaining numbers in each line denote:

    The size of a packet,

    and the direction it is traveling.

    negative numbers denote incoming packets

    positive numbers denote outgoing packets

    Figure 4 Data

    This data uses specific lines from the Virtual Reality.txt file.

    The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

    The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

    The .xlsx and .csv file are identical

    Each file includes (from right to left):

    The origional packet data,

    each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

    and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). similarweb.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/similarweb.com

similarweb.com Traffic Analytics Data

Explore at:
Dataset updated
Aug 1, 2025
Variables measured
Global Rank, Monthly Visits, Authority Score, US Country Rank, Online Services Category Rank
Description

Traffic analytics, rankings, and competitive metrics for similarweb.com as of August 2025

Search
Clear search
Close search
Google apps
Main menu