100+ datasets found
  1. The best websites specialized in wine 2017

    • statista.com
    Updated Jul 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2021). The best websites specialized in wine 2017 [Dataset]. https://www.statista.com/statistics/790008/the-best-websites-specialized-in-wine-worldwide/
    Explore at:
    Dataset updated
    Jul 5, 2021
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2017
    Area covered
    France, Worldwide
    Description

    This statistic shows a ranking of the best websites specialized in wine sales in 2017. That year, the website "www.wine.com" ranked the first place amongst online sales companies specialized in wine.

  2. Total global visitor traffic to user-generated content websites 2024

    • statista.com
    • ai-chatbox.pro
    Updated Nov 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Total global visitor traffic to user-generated content websites 2024 [Dataset]. https://www.statista.com/statistics/1328702/web-visitor-traffic-top-websites-ugc/
    Explore at:
    Dataset updated
    Nov 8, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2024
    Area covered
    Worldwide
    Description

    In March 2024, the video platform YouTube reported around 32.5 billion visits from global users. Meta-owned Facebook.com reported around 16.1 billion visits from global users, as Instagram.com and Twitter.com followed, each with 7 billion and 6.1 billion visits from users worldwide during the examined month. Wikipedia.org, which hosts users-generated encyclopedic entries, recorded around 4.4 billion visits, while news aggregator and community platform Reddit.com saw approximately 2.2 billion visits during the examined period.

  3. Top features of SME websites in the U.S. 2024

    • statista.com
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Top features of SME websites in the U.S. 2024 [Dataset]. https://www.statista.com/statistics/1461122/features-sme-websites/
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    An analysis showed that as of April 2024 only ** percent of small business home pages in the United States provided the users with contact information for the company they represented. Most commonly featured elements were photographs and call-to-action buttons, included on ** percent and ** percent of SME home pages, respectively.

  4. w

    WordPress Statistics

    • wpshout.com
    • bradswebdesigns.com
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VertiStudio (2023). WordPress Statistics [Dataset]. https://wpshout.com/wordpress-statistics/
    Explore at:
    Dataset updated
    Jun 3, 2023
    Dataset authored and provided by
    VertiStudio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2023
    Area covered
    Global
    Description

    Browse the most interesting pieces of data and statistics from around the world of WordPress. Use them whenever you’re working on a new article, blog post, infographic, or whatever else you have in store.

  5. E

    Wix vs Squarespace Statistics – Which Is Best? (2025)

    • electroiq.com
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Electro IQ (2025). Wix vs Squarespace Statistics – Which Is Best? (2025) [Dataset]. https://electroiq.com/stats/wix-vs-squarespace-statistics/
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Electro IQ
    License

    https://electroiq.com/privacy-policyhttps://electroiq.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    Wix vs Squarespace Statistics: In recent years, Wix and Squarespace have been termed as the two most popular platforms for website creation. They are best for businesses, individuals, and creators. Wix.com Ltd., or simply Wix, is an Israeli software company that provides cloud-based web development services and also offers tools for creating HTML5 websites for desktop and mobile platforms using online drag-and-drop editing.

    Squarespace, Inc. is an American website-building and hosting company that provides software as a service for website building and hosting. It allows users to use pre-built website templates and drag-and-drop elements to create and modify webpages. This article includes several information and statistical analysis from different insights, which will guide you in understanding the platforms better and allow you to choose the best option.

  6. w

    WordPress security statistics

    • wpshout.com
    • bradswebdesigns.com
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VertiStudio (2024). WordPress security statistics [Dataset]. https://wpshout.com/wordpress-statistics/
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset authored and provided by
    VertiStudio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    According to research by Sucuri, 60.04% of websites analyzed contained at least one backdoor, 52.6% of websites contained some form of SEO spam; 95.62% of those websites run on WordPress.

  7. w

    WordPress community statistics

    • wpshout.com
    • bradswebdesigns.com
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VertiStudio (2024). WordPress community statistics [Dataset]. https://wpshout.com/wordpress-statistics/
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset authored and provided by
    VertiStudio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The total number of WordCamps to ever take place is growing rapidly – currently at more than 1091 organized in total all over the globe, held in 373 cities, 65 countries, on 6 continents.

  8. NIST Statistical Reference Datasets - SRD 140

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Jul 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). NIST Statistical Reference Datasets - SRD 140 [Dataset]. https://catalog.data.gov/dataset/nist-statistical-reference-datasets-srd-140-df30c
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.

  9. Share of top U.S. websites ignoring user privacy preferences 2024

    • statista.com
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of top U.S. websites ignoring user privacy preferences 2024 [Dataset]. https://www.statista.com/statistics/1560221/us-privacy-preference-ignoring/
    Explore at:
    Dataset updated
    Mar 4, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 2024
    Area covered
    United States
    Description

    As of September 2024, 75 percent of the 100 most visited websites in the United States shared personal data with advertising 3rd parties, even when users opted out. Moreover, 70 percent of them drop advertising 3rd party cookies even when users opt out.

  10. w

    WordPress plugin statistics

    • wpshout.com
    • bradswebdesigns.com
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VertiStudio (2024). WordPress plugin statistics [Dataset]. https://wpshout.com/wordpress-statistics/
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset authored and provided by
    VertiStudio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    59,000+ WordPress plugins are in the official directory, with new ones being added daily.

  11. a

    Vatican Data, Year of Statistical Data

    • catholic-geo-hub-cgisc.hub.arcgis.com
    • hub.arcgis.com
    Updated Oct 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    burhansm2 (2019). Vatican Data, Year of Statistical Data [Dataset]. https://catholic-geo-hub-cgisc.hub.arcgis.com/maps/36fcd8c2e2b04b48bcbc19602dcda867
    Explore at:
    Dataset updated
    Oct 22, 2019
    Dataset authored and provided by
    burhansm2
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Area covered
    Description

    Vatican Data Series {title at top of page}Data Developers: Burhans, Molly A., Cheney, David M., Emege, Thomas, Gerlt, R.. . “Vatican Data Series {title at top of page}”. Scale not given. Version 1.0. MO and CT, USA: GoodLands Inc., Catholic Hierarchy, Environmental Systems Research Institute, Inc., 2019.Web map developer: Molly Burhans, October 2019Web app developer: Molly Burhans, October 2019GoodLands’ polygon data layers, version 2.0 for global ecclesiastical boundaries of the Roman Catholic Church:Although care has been taken to ensure the accuracy, completeness and reliability of the information provided, due to this being the first developed dataset of global ecclesiastical boundaries curated from many sources it may have a higher margin of error than established geopolitical administrative boundary maps. Boundaries need to be verified with appropriate Ecclesiastical Leadership. The current information is subject to change without notice. No parties involved with the creation of this data are liable for indirect, special or incidental damage resulting from, arising out of or in connection with the use of the information. We referenced 1960 sources to build our global datasets of ecclesiastical jurisdictions. Often, they were isolated images of dioceses, historical documents and information about parishes that were cross checked. These sources can be viewed here:https://docs.google.com/spreadsheets/d/11ANlH1S_aYJOyz4TtG0HHgz0OLxnOvXLHMt4FVOS85Q/edit#gid=0To learn more or contact us please visit: https://good-lands.org/The Catholic Leadership global maps information is derived from the Annuario Pontificio, which is curated and published by the Vatican Statistics Office annually, and digitized by David Cheney at Catholic-Hierarchy.org -- updated are supplemented with diocesan and news announcements. GoodLands maps this into global ecclesiastical boundaries. Admin 3 Ecclesiastical Territories:Burhans, Molly A., Cheney, David M., Gerlt, R.. . “Admin 3 Ecclesiastical Territories For Web”. Scale not given. Version 1.2. MO and CT, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2019.Derived from:Global Diocesan Boundaries:Burhans, M., Bell, J., Burhans, D., Carmichael, R., Cheney, D., Deaton, M., Emge, T. Gerlt, B., Grayson, J., Herries, J., Keegan, H., Skinner, A., Smith, M., Sousa, C., Trubetskoy, S. “Diocesean Boundaries of the Catholic Church” [Feature Layer]. Scale not given. Version 1.2. Redlands, CA, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2016.Using: ArcGIS. 10.4. Version 10.0. Redlands, CA: Environmental Systems Research Institute, Inc., 2016.Boundary ProvenanceStatistics and Leadership DataCheney, D.M. “Catholic Hierarchy of the World” [Database]. Date Updated: August 2019. Catholic Hierarchy. Using: Paradox. Retrieved from Original Source.Catholic HierarchyAnnuario Pontificio per l’Anno .. Città del Vaticano :Tipografia Poliglotta Vaticana, Multiple Years.The data for these maps was extracted from the gold standard of Church data, the Annuario Pontificio, published yearly by the Vatican. The collection and data development of the Vatican Statistics Office are unknown. GoodLands is not responsible for errors within this data. We encourage people to document and report errant information to us at data@good-lands.org or directly to the Vatican.Additional information about regular changes in bishops and sees comes from a variety of public diocesan and news announcements.

  12. w

    WordPress development statistics

    • wpshout.com
    • bradswebdesigns.com
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VertiStudio (2024). WordPress development statistics [Dataset]. https://wpshout.com/wordpress-statistics/
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset authored and provided by
    VertiStudio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There have been 43 major versions of WordPress released since the platform’s inception.

  13. d

    Football API | World Plan | SportMonks Sports data for 100 + leagues...

    • datarade.ai
    .json
    Updated Jun 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SportMonks (2021). Football API | World Plan | SportMonks Sports data for 100 + leagues worldwide [Dataset]. https://datarade.ai/data-products/football-api-world-plan-sportsdata-for-100-leagues-worldwide-sportmonks
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Jun 9, 2021
    Dataset authored and provided by
    SportMonks
    Area covered
    Ukraine, Romania, China, United Kingdom, Malta, Switzerland, Iran (Islamic Republic of), Poland, United Arab Emirates, United States of America
    Description

    Use our trusted SportMonks Football API to build your own sports application and be at the forefront of football data today.

    Our Football API is designed for iGaming, media, developers and football enthusiasts alike, ensuring you can create a football application that meets your needs.

    Over 20,000 sports fanatics make use of our data. We know what data works best for you, so we ensured that our Football API has all the necessary tools you need to create a successful football application.

    • Livescores and schedules Our Football API features extremely fast livescores and up-to-date season schedules, meaning your app will be the first to notify its customers about a goal scored. This also works to further improve the look and feel of your website.

    • Statistics and line-ups We offer various kinds of football statistics, ranging from (live) player statistics to team, match and season statistics. And that’s not all - we also provide pre-match lineups for all important leagues.

    • Coverage and historical data Our Football API covers over 1,200 leagues, all managed by our in-house scouts and data platform. That means there’s up to 14 years of historical data available.

    • Bookmakers and odds Build your football sportsbook, odds comparison or betting portal with our pre-match and in-play odds collated from all major bookmakers and markets.

    • TV Stations and highlights Show your customers where the football games are broadcasted and provide video highlights of major match events.

    • Standings and topscorers Enhance your football website with standings and live standings, and allow your customers to see the top scorers and what the season's standings are.

  14. w

    WordPress theme statistics

    • wpshout.com
    • bradswebdesigns.com
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VertiStudio (2024). WordPress theme statistics [Dataset]. https://wpshout.com/wordpress-statistics/
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset authored and provided by
    VertiStudio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There are more than 5,000 themes in the official theme directory at WordPress.org.

  15. Best Books Ever Dataset

    • zenodo.org
    csv
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

    The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

    Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

    The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

    Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

    The 25 fields of the dataset are:

    | Attributes | Definition | Completeness |
    | ------------- | ------------- | ------------- | 
    | bookId | Book Identifier as in goodreads.com | 100 |
    | title | Book title | 100 |
    | series | Series Name | 45 |
    | author | Book's Author | 100 |
    | rating | Global goodreads rating | 100 |
    | description | Book's description | 97 |
    | language | Book's language | 93 |
    | isbn | Book's ISBN | 92 |
    | genres | Book's genres | 91 |
    | characters | Main characters | 26 |
    | bookFormat | Type of binding | 97 |
    | edition | Type of edition (ex. Anniversary Edition) | 9 |
    | pages | Number of pages | 96 |
    | publisher | Editorial | 93 |
    | publishDate | publication date | 98 |
    | firstPublishDate | Publication date of first edition | 59 |
    | awards | List of awards | 20 |
    | numRatings | Number of total ratings | 100 |
    | ratingsByStars | Number of ratings by stars | 97 |
    | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
    | setting | Story setting | 22 |
    | coverImg | URL to cover image | 99 |
    | bbeScore | Score in Best Books Ever list | 100 |
    | bbeVotes | Number of votes in Best Books Ever list | 100 |
    | price | Book's price (extracted from Iberlibro) | 73 |

  16. w

    WordPress usage and popularity

    • wpshout.com
    • bradswebdesigns.com
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VertiStudio (2024). WordPress usage and popularity [Dataset]. https://wpshout.com/wordpress-statistics/
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset authored and provided by
    VertiStudio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The growth of WordPress’ market share is quite impressive. Here’s how these stats played out over the last five years.

  17. s

    WordPress Market Share Statistics

    • searchlogistics.com
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). WordPress Market Share Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/wordpress-statistics/
    Explore at:
    Dataset updated
    Jan 21, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here is a list of the top content management systems available right now and their total marketing share.

  18. d

    Top 1000 Arlington County Website Pages

    • catalog.data.gov
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arlington County (2020). Top 1000 Arlington County Website Pages [Dataset]. https://catalog.data.gov/dataset/top-1000-arlington-county-website-pages
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    Arlington County
    Area covered
    Arlington County
    Description

    Monthly statistics for the top 1,000 Arlington County Government public webpages.

  19. A

    ‘Top 1000 Arlington County Website Pages’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Top 1000 Arlington County Website Pages’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-top-1000-arlington-county-website-pages-6575/b5235d9d/?iid=002-305&v=presentation
    Explore at:
    Dataset updated
    Feb 12, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Arlington County
    Description

    Analysis of ‘Top 1000 Arlington County Website Pages’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/1dd4ea92-3dfe-42e0-a0a1-71f048f4f9d7 on 12 February 2022.

    --- Dataset description provided by original source is as follows ---

    Monthly statistics for the top 1,000 Arlington County Government public webpages.

    --- Original source retains full ownership of the source dataset ---

  20. Z

    Network Traffic Analysis: Data and Code

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Honig, Joshua (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Chan-Tin, Eric
    Moran, Madeline
    Ferrell, Nathan
    Homan, Sophia
    Honig, Joshua
    Soni, Shreena
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code:

    Packet_Features_Generator.py & Features.py

    To run this code:

    pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

    -h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

    Purpose:

    Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

    Uses Features.py to calcualte the features.

    startMachineLearning.sh & machineLearning.py

    To run this code:

    bash startMachineLearning.sh

    This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

    Options (to be edited within this file):

    --evaluate-only to test 5 fold cross validation accuracy

    --test-scaling-normalization to test 6 different combinations of scalers and normalizers

    Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

    --grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

    Purpose:

    Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

    Data

    Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

    Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

    First number is a classification number to denote what website, query, or vr action is taking place.

    The remaining numbers in each line denote:

    The size of a packet,

    and the direction it is traveling.

    negative numbers denote incoming packets

    positive numbers denote outgoing packets

    Figure 4 Data

    This data uses specific lines from the Virtual Reality.txt file.

    The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

    The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

    The .xlsx and .csv file are identical

    Each file includes (from right to left):

    The origional packet data,

    each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

    and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2021). The best websites specialized in wine 2017 [Dataset]. https://www.statista.com/statistics/790008/the-best-websites-specialized-in-wine-worldwide/
Organization logo

The best websites specialized in wine 2017

Explore at:
Dataset updated
Jul 5, 2021
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2017
Area covered
France, Worldwide
Description

This statistic shows a ranking of the best websites specialized in wine sales in 2017. That year, the website "www.wine.com" ranked the first place amongst online sales companies specialized in wine.

Search
Clear search
Close search
Google apps
Main menu