89 datasets found
  1. w

    Websites using Similar Posts Ontology

    • webtechsurvey.com
    csv
    Updated Oct 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using Similar Posts Ontology [Dataset]. https://webtechsurvey.com/technology/similar-posts-ontology
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 11, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Similar Posts Ontology technology, compiled through global website indexing conducted by WebTechSurvey.

  2. w

    Websites using Similar Posts Ai Spai

    • webtechsurvey.com
    csv
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using Similar Posts Ai Spai [Dataset]. https://webtechsurvey.com/technology/similar-posts-ai-spai
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 23, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Similar Posts Ai Spai technology, compiled through global website indexing conducted by WebTechSurvey.

  3. Popular websites across the globe

    • kaggle.com
    zip
    Updated May 27, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bpali26 (2017). Popular websites across the globe [Dataset]. https://www.kaggle.com/bpali26/popular-websites-across-the-globe
    Explore at:
    zip(639485 bytes)Available download formats
    Dataset updated
    May 27, 2017
    Authors
    bpali26
    Description

    Context

    This dataset includes some of the basic information of the websites we daily use. While scrapping this info, I learned quite a lot in R programming, system speed, memory usage etc. and developed my niche in Web Scrapping. It took about 4-5 hrs for scrapping this data through my system (4GB RAM) and nearly about 4-5 days working out my idea through this project.

    Content

    The dataset contains Top 50 ranked sites from each 191 countries along with their traffic (global) rank. Here, country_rank represent the traffic rank of that site within the country, and traffic_rank represent the global traffic rank of that site.

    Since most of the columns meaning can be derived from their name itself, its pretty much straight forward to understand this dataset. However, there are some instances of confusion which I would like to explain in here:

    1) most of the numeric values are in character format, hence, contain spaces which you might need to clean on.

    2) There are multiple instances of same website. for.e.g. Yahoo. com is present in 179 rows within this dataset. This is due to their different country rank in each country.

    3)The information provided in this dataset is for the top 50 websites in 191 countries as on 25th May 2017 and is subjected to change in future time due to the dynamic structure of ranking.

    4) The dataset inactual contains 9540 rows instead of 9550(50*191 rows). This was due to the unavailability of information for 10 websites.

    PS: in case if there are anymore queries, comment on this, I'll add an answer to that in above list.

    Acknowledgements

    I wouldn't have done this without the help of others. I've scrapped this information from publicly available (open to all) websites namely: 1) http://data.danetsoft.com/ 2) http://www.alexa.com/topsites , of which i'm highly grateful. I truly appreciate and thanks the owner of these sites for providing us with the information that I included today in this dataset.

    Inspiration

    I feel that there this a lot of scope for exploring & visualization this dataset to find out the trends in the attributes of these websites across countries. Also, one could try predicting the traffic(global) rank being a dependent factor on the other attributes of the website. In any case, this dataset will help you find out the popular sites in your area.

  4. Leading websites worldwide 2025, by monthly visits

    • statista.com
    • boostndoto.org
    Updated Oct 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading websites worldwide 2025, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Aug 2025
    Area covered
    Worldwide
    Description

    In August 2025, Google.com was the most visited website worldwide, with an average of 98.2 billion monthly visits. The platform has maintained its leading position since June 2010, when it surpassed Yahoo to take first place. YouTube ranked second during the same period, recording over 48 billion monthly visits. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.

  5. and-just-like-that.org Website Traffic, Ranking, Analytics [September 2025]

    • semrush.ebundletools.com
    Updated Oct 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). and-just-like-that.org Website Traffic, Ranking, Analytics [September 2025] [Dataset]. https://semrush.ebundletools.com/website/and-just-like-that.org/overview/
    Explore at:
    Dataset updated
    Oct 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Oct 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    and-just-like-that.org is ranked #4353 in RU with 338.19K Traffic. Categories: . Learn more about website traffic, market share, and more!

  6. Z

    Network Traffic Analysis: Data and Code

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Loyola University Chicago
    Authors
    Moran, Madeline; Honig, Joshua; Ferrell, Nathan; Soni, Shreena; Homan, Sophia; Chan-Tin, Eric
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code:

    Packet_Features_Generator.py & Features.py

    To run this code:

    pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

    -h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

    Purpose:

    Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

    Uses Features.py to calcualte the features.

    startMachineLearning.sh & machineLearning.py

    To run this code:

    bash startMachineLearning.sh

    This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

    Options (to be edited within this file):

    --evaluate-only to test 5 fold cross validation accuracy

    --test-scaling-normalization to test 6 different combinations of scalers and normalizers

    Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

    --grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

    Purpose:

    Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

    Data

    Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

    Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

    First number is a classification number to denote what website, query, or vr action is taking place.

    The remaining numbers in each line denote:

    The size of a packet,

    and the direction it is traveling.

    negative numbers denote incoming packets

    positive numbers denote outgoing packets

    Figure 4 Data

    This data uses specific lines from the Virtual Reality.txt file.

    The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

    The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

    The .xlsx and .csv file are identical

    Each file includes (from right to left):

    The origional packet data,

    each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

    and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.

  7. i-like-seen.com Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). i-like-seen.com Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/i-like-seen.com/overview/
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    i-like-seen.com is ranked #904 in JP with 3.86M Traffic. Categories: . Learn more about website traffic, market share, and more!

  8. d

    Dataset for: Same Question, Different Answers? An Empirical Comparison of...

    • demo-b2find.dkrz.de
    Updated Sep 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Dataset for: Same Question, Different Answers? An Empirical Comparison of Web Data and Traditional Data - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/bc684dad-c657-5013-b2d4-cc35b4a2e7ee
    Explore at:
    Dataset updated
    Sep 22, 2025
    Description

    Psychological scientists increasingly study web data, such as user ratings or social media postings. However, whether research relying on such web data leads to the same conclusions as research based on traditional data is largely unknown. To test this, we (re)analyzed three datasets, thereby comparing web data with lab and online survey data. We calculated correlations across these different datasets (Study 1) and investigated identical, illustrative research questions in each dataset (Studies 2 to 4). Our results suggest that web and traditional data are not fundamentally different and usually lead to similar conclusions, but also that it is important to consider differences between data types such as populations and research settings. Web data can be a valuable tool for psychologists when accounting for such differences, as it allows for testing established research findings in new contexts, complementing them with insights from novel data sources.

  9. Website Classification

    • kaggle.com
    zip
    Updated May 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hetul Mehta (2021). Website Classification [Dataset]. https://www.kaggle.com/hetulmehta/website-classification
    Explore at:
    zip(2094838 bytes)Available download formats
    Dataset updated
    May 5, 2021
    Authors
    Hetul Mehta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset was created by scraping different websites and then classifying them into different categories based on the extracted text.

    Content

    Below are the values each column has. The column names are pretty self-explanatory. website_url: URL link of the website. cleaned_website_text: the cleaned text content extracted from the

  10. same.new Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). same.new Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/same.new/overview/
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    same.new is ranked #25576 in IN with 545.43K Traffic. Categories: . Learn more about website traffic, market share, and more!

  11. Watching paid content on websites like Netflix and HBO in Norway 2009-2020

    • statista.com
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Watching paid content on websites like Netflix and HBO in Norway 2009-2020 [Dataset]. https://www.statista.com/statistics/981176/watching-paid-content-on-websites-like-netflix-and-hbo-in-norway/
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Norway
    Description

    The share of individuals watching paid content on websites like Netflix and HBO in Norway generally increased from 2009 to 2020. In 2009, the share amounted to three percent of respondents, whereas in 2020 it reached ** percent.

  12. Z

    Curlie Enhanced with LLM Annotations: Two Datasets for Advancing...

    • data-staging.niaid.nih.gov
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nutter, Peter; Senghaas, Mika; Cizinsky, Ludek (2023). Curlie Enhanced with LLM Annotations: Two Datasets for Advancing Homepage2Vec's Multilingual Website Classification [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_10413067
    Explore at:
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Czech Technical University in Prague
    École Polytechnique Fédérale de Lausanne
    Authors
    Nutter, Peter; Senghaas, Mika; Cizinsky, Ludek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Advancing Homepage2Vec with LLM-Generated Datasets for Multilingual Website Classification

    This dataset contains two subsets of labeled website data, specifically created to enhance the performance of Homepage2Vec, a multi-label model for website classification. The datasets were generated using Large Language Models (LLMs) to provide more accurate and diverse topic annotations for websites, addressing a limitation of existing Homepage2Vec training data.

    Key Features:

    LLM-generated annotations: Both datasets feature website topic labels generated using LLMs, a novel approach to creating high-quality training data for website classification models.

    Improved multi-label classification: Fine-tuning Homepage2Vec with these datasets has been shown to improve its macro F1 score from 38% to 43% evaluated on a human-labeled dataset, demonstrating their effectiveness in capturing a broader range of website topics.

    Multilingual applicability: The datasets facilitate classification of websites in multiple languages, reflecting the inherent multilingual nature of Homepage2Vec.

    Dataset Composition:

    curlie-gpt3.5-10k: 10,000 websites labeled using GPT-3.5, context 2 and 1-shot

    curlie-gpt4-10k: 10,000 websites labeled using GPT-4, context 2 and zero-shot

    Intended Use:

    Fine-tuning and advancing Homepage2Vec or similar website classification models

    Research on LLM-generated datasets for text classification tasks

    Exploration of multilingual website classification

    Additional Information:

    Project and report repository: https://github.com/CS-433/ml-project-2-mlp

    Acknowledgments:

    This dataset was created as part of a project at EPFL's Data Science Lab (DLab) in collaboration with Prof. Robert West and Tiziano Piccardi.

  13. same.energy Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). same.energy Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/same.energy/overview/
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    same.energy is ranked #78327 in US with 323.15K Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!

  14. same-witness.com Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). same-witness.com Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/same-witness.com/overview/
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    same-witness.com is ranked #0 in PH with 7.63M Traffic. Categories: . Learn more about website traffic, market share, and more!

  15. Z

    Traffic Acquisition to LAMs Websites

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Apr 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ioannis C. Drivas; Dimitrios Kouis (2022). Traffic Acquisition to LAMs Websites [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6505276
    Explore at:
    Dataset updated
    Apr 30, 2022
    Dataset provided by
    Information Management Lab, Archival, Library and Information Studies, University of West Attica
    Authors
    Ioannis C. Drivas; Dimitrios Kouis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preliminary research efforts regarding Social Media Platforms and their contribution to website traffic in LAMs. Through the Similar Web API, the leading social networks (Facebook, Twitter, Youtube, Instagram, Reddit, Pinterest, LinkedIn) that drove traffic to each one of the 220 cases in our dataset were identified and analyzed in the first sheet. Aggregated results proved that Facebook platform was responsible for 46.1% of social traffic (second sheet).

  16. like.vn Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). like.vn Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/like.vn/overview/
    Explore at:
    Dataset updated
    Nov 11, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 11, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    like.vn is ranked #4927 in VN with 101.62K Traffic. Categories: . Learn more about website traffic, market share, and more!

  17. built-different.co Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). built-different.co Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/built-different.co/overview/
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    built-different.co is ranked #24367 in GB with 120.22K Traffic. Categories: . Learn more about website traffic, market share, and more!

  18. Website Elements

    • kaggle.com
    zip
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darsh Kachroo (2024). Website Elements [Dataset]. https://www.kaggle.com/datasets/darsh22blc1378/website-elements
    Explore at:
    zip(597554533 bytes)Available download formats
    Dataset updated
    Oct 4, 2024
    Authors
    Darsh Kachroo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset consists of images and their respective yolo labels for bounding box prediction. There are 144 classes which are predicted and are mentioned in the data.yaml file.

  19. Effective SEO parameters for all types of websites

    • kaggle.com
    zip
    Updated Jun 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashkan Goharfar (2020). Effective SEO parameters for all types of websites [Dataset]. https://www.kaggle.com/ashkangoharfar/sites-information-data-from-alexacom-dataset
    Explore at:
    zip(3434959 bytes)Available download formats
    Dataset updated
    Jun 16, 2020
    Authors
    Ashkan Goharfar
    Description

    Context

    Alexa Internet rank websites primarily on tracking a sample set of Internet traffic—users of its toolbar for the Internet Explorer, Firefox and Google Chrome web browsers. The Alexa Toolbar includes a popup blocker (which stops unwanted ads), a search box, links to Amazon.com and the Alexa homepage, and the Alexa ranking of the website that the user is visiting. It also allows the user to rate the website and view links to external, relevant websites. Also, Alexa has prepared a list of information for each site for comparison and ranking with other similar sites for each site.

    This dataset is a record of all information on the top websites in each category in Alexa ranking. Source: https://github.com/AshkanGoharfar/Crawler_for_alexa.com

    Content

    This dataset includes several site data, which were achieved from "alexa.com/siteinfo" (for example alexa.com/siteinfo/facebook.com). Data is included for the top 50 websites for every 550 categories in Alexa ranking. (The dataset was obtained for about 22000 sites.) The data also includes keyword opportunities breakdown fields, which vary between categories. As well as each site has important parameters like all_topics_top_keywords_search_traffic_parameter which represent search traffics in competitor websites to this site. For more details about each site's data, you can find the site's name and site's information in the dataset and you can search alexa.com/siteinfo/SiteName link to understand each parameter and columns in the dataset.

    Acknowledgements

    This dataset was collected using the selenium library and chrome web driver to crawl alexa.com data with python language.

    Provider: Ashkan Goharfar, ashkan_goharfar@aut.ac.ir, Department of Computer Engineering and Information Technology, Amirkabir University of Technology

    Citation Request

    A. Risheh, A. Goharfar, and N. T. Javan, "Clustering Alexa Internet Data using Auto Encoder Network and Affinity Propagation," 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 2020, pp. 437-443, doi: 10.1109/ICCKE50421.2020.9303705.

    Inspiration

    Possible uses for this dataset could include:

    Sentiment analysis in a variety of forms. Categorizing websites based on their competitor websites, daily time on the website and Keyword opportunities.

    Analyzing what factors affect on Comparison metrics search traffic, Comparison metrics data, Audience overlap sites overlap scores, top keywords share of voice, top keywords search traffic, optimization opportunities organic share of voice, Optimization opportunities search popularity, Buyer keywords organic competition, Buyer keywords Avg traffic, Easy to rank keywords search pop, Easy to rank keywords relevance to site, Keyword gaps search popularity, Keyword gaps Avg traffic and Keywords search traffic.

    Training ML algorithms like RNNs to generate a probability for each site in each category to being SEO by Google.

    Use NLP for columns like keyword gaps name, Easy to rank keywords name, Buyer keywords name, optimization opportunities name, Top keywords name and Audience overlap similar sites to this site.

  20. Color palettes

    • kaggle.com
    zip
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PatnaikuniMohit (2025). Color palettes [Dataset]. https://www.kaggle.com/datasets/patnaikunimohit/color-palettes/code
    Explore at:
    zip(46303 bytes)Available download formats
    Dataset updated
    Jul 15, 2025
    Authors
    PatnaikuniMohit
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by PatnaikuniMohit

    Released under MIT

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
WebTechSurvey (2025). Websites using Similar Posts Ontology [Dataset]. https://webtechsurvey.com/technology/similar-posts-ontology

Websites using Similar Posts Ontology

Explore at:
csvAvailable download formats
Dataset updated
Oct 11, 2025
Dataset authored and provided by
WebTechSurvey
License

https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

Time period covered
2025
Area covered
Global
Description

A complete list of live websites using the Similar Posts Ontology technology, compiled through global website indexing conducted by WebTechSurvey.

Search
Clear search
Close search
Google apps
Main menu