100+ datasets found
  1. w

    Websites using WordPress

    • webtechsurvey.com
    csv
    Updated Apr 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2020). Websites using WordPress [Dataset]. https://webtechsurvey.com/technology/wordpress
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 4, 2020
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the WordPress technology, compiled through global website indexing conducted by WebTechSurvey.

  2. R

    Web Page Object Detection Dataset

    • universe.roboflow.com
    zip
    Updated Mar 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    web page summarizer (2023). Web Page Object Detection Dataset [Dataset]. https://universe.roboflow.com/web-page-summarizer/web-page-object-detection
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 2, 2023
    Dataset authored and provided by
    web page summarizer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Web Page Elements Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Web Accessibility Improvement: The "Web Page Object Detection" model can be used to identify and label various elements on a web page, making it easier for people with visual impairments to navigate and interact with websites using screen readers and other assistive technologies.

    2. Web Design Analysis: The model can be employed to analyze the structure and layout of popular websites, helping web designers understand best practices and trends in web design. This information can inform the creation of new, user-friendly websites or redesigns of existing pages.

    3. Automatic Web Page Summary Generation: By identifying and extracting key elements, such as titles, headings, content blocks, and lists, the model can assist in generating concise summaries of web pages, which can aid users in their search for relevant information.

    4. Web Page Conversion and Optimization: The model can be used to detect redundant or unnecessary elements on a web page and suggest their removal or modification, leading to cleaner designs and faster-loading pages. This can improve user experience and, potentially, search engine rankings.

    5. Assisting Web Developers in Debugging and Testing: By detecting web page elements, the model can help identify inconsistencies or errors in a site's code or design, such as missing or misaligned elements, allowing developers to quickly diagnose and address these issues.

  3. m

    Web page phishing detection

    • data.mendeley.com
    Updated Sep 28, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelhakim Hannousse (2020). Web page phishing detection [Dataset]. http://doi.org/10.17632/c2gw7fy2j4.2
    Explore at:
    Dataset updated
    Sep 28, 2020
    Authors
    Abdelhakim Hannousse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The provided dataset includes 11430 URLs with 87 extracted features. The dataset are designed to be used as a a benchmark for machine learning based phishing detection systems. Features are from three different classes: 56 extracted from the structure and syntax of URLs, 24 extracted from the content of their correspondent pages and 7 are extracetd by querying external services. The datatset is balanced, it containes exactly 50% phishing and 50% legitimate URLs. Associated to the dataset, we provide Python scripts used for the extraction of the features for potential replication or extension.

    dataset_A: contains a list a URLs together with their DOM tree objects that can be used for replication and experimenting new URL and content-based features overtaking short-time living of phishing web pages.

    dataset_B: containes the extracted feature values that can be used directly as inupt to classifiers for examination. Note that the data in this dataset are indexed with URLs so that one need to remove the index before experimentation.

    Datasets are constructed on May 2020. Due to huge size of dataset A, only a sample of the dataset is provided, it will be divided into sample files and uploaded one by one, for urgent need of full copy, please contact directly the author at: hannousse.abdelhakim@univ-guelma.dz

  4. w

    Websites using Simple File List

    • webtechsurvey.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey, Websites using Simple File List [Dataset]. https://webtechsurvey.com/technology/simple-file-list
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Simple File List technology, compiled through global website indexing conducted by WebTechSurvey.

  5. w

    Websites using Web Page Maker

    • webtechsurvey.com
    csv
    Updated Jul 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2023). Websites using Web Page Maker [Dataset]. https://webtechsurvey.com/technology/web-page-maker
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 26, 2023
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Web Page Maker technology, compiled through global website indexing conducted by WebTechSurvey.

  6. List of websites with CS projects information

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TIDE-UPF; TIDE-UPF (2022). List of websites with CS projects information [Dataset]. http://doi.org/10.5281/zenodo.7310295
    Explore at:
    Dataset updated
    Nov 22, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    TIDE-UPF; TIDE-UPF
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the list of wesbites from where TIDE-UPF extracted the CS projects information.

  7. Open central government websites - February 2013

    • gov.uk
    Updated Jul 9, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cabinet Office (2013). Open central government websites - February 2013 [Dataset]. https://www.gov.uk/government/publications/open-central-government-websites-february-2013
    Explore at:
    Dataset updated
    Jul 9, 2013
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Cabinet Office
    Description

    Background

    Number of and list of central government open websites – 474 as of 13 February 2013.

    Information was reported as correct by central government departments at 13 February 2013.

    The Cabinet Office committed to begin quarterly publication of the number of open websites starting in financial year 2011.

    Definition of a website

    The definition used of a website is a user-centric one. Something is counted as a separate website if it is active and either has a separate domain name or, when as a subdomain, the user cannot move freely between the subsite and parent site and there is no family likeness in the design. In other words, if the user experiences it as a separate site in their normal uses of browsing, search and interaction, it is counted as one.

    Definition of a closed website

    A website is considered closed when it ceases to be actively funded, run and managed by central government, either by packaging information and putting it in the right place for the intended audience on another website or digital channel, or by a third party taking and managing it and bearing the cost. Where appropriate, domains stay operational in order to redirect users to the http://www.nationalarchives.gov.uk/webarchive/" class="govuk-link">UK Government Website Archive.

    Explanation for increase in sites reported

    Since the previous quarterly report of 22 October 2012, there has been an extra 124 sites reported. This increase is due to a change in the scope of the audit as the Government Digital Service (GDS) felt that the previous method of using the The National Archives database to source this information was not sufficiently and accurately capturing the data that was required. The new process and scope has resulted in more websites being included e.g. Directgov URLs, dot independent sites and national parks. Also, the latest GOV.UK exemption process has brought to our attention many more sites than we were previously aware of.

    Definition of the exemption process

    The GOV.UK exemption process began with a web rationalisation of the government’s Internet estate to reduce the number of obsolete websites and to establish the scale of the websites that the government owns.

    Exclusions from the central government list

    Not included in the number or list are websites of public corporations as listed on the Office for National Statistics website, partnerships more than half-funded by private sector, charities and national museums. Specialist closed audience functions, such as the BIS Research Councils, BIS Sector Skills Councils and Industrial Training Boards, and the Defra Levy Boards and their websites, are not included in this data. The Ministry of Defence conducted their own rationalisation of MOD and the armed forces sites as an integral part of the Website Review; military sites belonging to a particular service are excluded from this dataset. Finally, those public bodies set up by Parliament and reporting directly to the Speaker’s Committee and only reporting through a ministerial government department for the purposes of enaction of legislation are also excluded (for example, the Electoral Commission and IPSA).

    Inclusion under department name

    Websites are listed under the department name for which the minister in HMG has responsibility, either directly through their departmental activities, or indirectly through being the minister reporting to Parliament for independent bodies set up by statute.

    List of open websites

    For re-usability, these are provided as Excel and CSV files.

  8. g

    Lists of websites and services in.gouv.fr

    • gimi9.com
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Lists of websites and services in.gouv.fr [Dataset]. https://gimi9.com/dataset/eu_5d64f85e8b4c415bb5166012/
    Explore at:
    Dataset updated
    Dec 19, 2024
    Description

    Following the list of websites in ‘.gouv.fr’ generated on the repository GitHub gouvfrlist, here is a list of websites and web services in ‘.gouv.fr’. It made it possible to make a graph representation of domains and subdomains by ministry and administrations (deconcentrated). We also relied on the list of top 250 of administrative procedures and the list of sites en.gouv.fr dating from 2014. Graphic representation of.gouv.fr websites ### Deposit GitHub The project description and data set are available in the Github repository graph-gouv-en de jbledevehat ### Legend The objects represented are: — The President of the French Republic and the Prime Minister are qualified as “Person” (in blue) — Departments or administrative branches (in yellow) — Websites (in green) — The subdomains of these websites (in orange) — Online services (in red) — Citizen consultation sites (in pink) — Web sites and services either archived or inactive (in black) — Undefined (incoherent) websites (in grey) ### Web publications This representation is available on the application KUMU at the following address: https://kumu.io/jbledevehat/sites-web-gouvfr#liste-des-sites-web-en-gouvfr-v1

  9. e

    Most popular websites in the Netherlands 2015 - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Jun 2, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Most popular websites in the Netherlands 2015 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/c47411e6-3cbb-5381-b5b5-e17c0aa87cde
    Explore at:
    Dataset updated
    Jun 2, 2017
    Area covered
    Netherlands
    Description

    This dataset contains a list of 3654 Dutch websites that we considered the most popular websites in 2015. This list served as whitelist for the Newstracker Research project in which we monitored the online web behaviour of a group of respondents.The research project 'The Newstracker' was a subproject of the NWO-funded project 'The New News Consumer: A User-Based Innovation Project to Meet Paradigmatic Change in News Use and Media Habits'.For the Newstracker project we aimed to understand the web behaviour of a group of respondents. We created custom-built software to monitor their web browsing behaviour on their laptops and desktops (please find the code in open access at https://github.com/NITechLabs/NewsTracker). For reasons of scale and privacy we created a whitelist with websites that were the most popular websites in 2015. We manually compiled this list by using data of DDMM, Alexa and own research. The dataset consists of 5 columns:- the URL- the type of website: We created a list of types of websites and each website has been manually labeled with 1 category- Nieuws-regio: When the category was 'News', we subdivided these websites in the regional focus: International, National or Local- Nieuws-onderwerp: Furthermore, each website under the category News was further subdivided in type of news website. For this we created an own list of news categories and manually coded each website- Bron: For each website we noted which source we used to find this website.The full description of the research design of the Newstracker including the set-up of this whitelist is included in the following article: Kleppe, M., Otte, M. (in print), 'Analysing & understanding news consumption patterns by tracking online user behaviour with a multimodal research design', Digital Scholarship in the Humanities, doi 10.1093/llc/fqx030.

  10. Leading websites worldwide 2024, by monthly visits

    • statista.com
    • old-kremlin.ru
    • +4more
    Updated Mar 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
    Explore at:
    Dataset updated
    Mar 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2024
    Area covered
    Worldwide
    Description

    In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.

  11. LAT Bright Source List

    • catalog.data.gov
    • data.amerigeoss.org
    • +1more
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2025). LAT Bright Source List [Dataset]. https://catalog.data.gov/dataset/lat-bright-source-list
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The Fermi Gamma-ray Space Telescope (Fermi) Large Area Telescope (LAT) is a successor to EGRET, with greatly improved sensitivity, resolution, and energy range. This web page presents the first full catalog of LAT sources, based on the first eleven months of survey data. For a full explanation about the catalog and its construction see the LAT 1-year Catalog Paper.

  12. Open central government websites – January 2014

    • gov.uk
    Updated Jan 31, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cabinet Office (2014). Open central government websites – January 2014 [Dataset]. https://www.gov.uk/government/publications/open-central-government-websites-january-2014
    Explore at:
    Dataset updated
    Jan 31, 2014
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Cabinet Office
    Description

    Number and list of central government open websites – 455 as at 31 December 2013.

    The Cabinet Office committed to begin quarterly publication of the number of open websites starting in the financial year 2011.

    Definition of a website

    The definition used is a user-centric one. Something is counted as a separate website if it is active and either has a separate domain name or, when as a subdomain, the user cannot move freely between the subsite and parent site and there is no family likeness in the design. In other words, if the user experiences it as a separate site in their normal uses of browsing, search and interaction, it is counted as one.

    Definition of a closed website

    A website is considered closed when it ceases to be actively funded, run and managed by central government, either by packaging information and putting it in the right place for the intended audience on another website or digital channel, or by a third party taking and managing it and bearing the cost. Where appropriate, domains stay operational in order to redirect users to the http://www.nationalarchives.gov.uk/webarchive/" class="govuk-link">UK Government Website Archive.

    Definition of the exemption process

    The GOV.UK exemption process began with a web rationalisation of the government’s internet estate to reduce the number of obsolete websites and to establish the scale of the websites that the government owns.

    Exclusions from the central government list

    Not included in the number or list are:

    • websites of public corporations as listed on the http://www.ons.gov.uk/ons/publications/re-reference-tables.html?edition=tcm%3A77-329008" class="govuk-link">Office for National Statistics website partnerships more than half-funded by private sector
    • charities and national museums
    • specialist closed audience functions, such as the BIS Research Councils, BIS Sector Skills Councils and Industrial Training Boards, and the Defra Levy Boards and their websites

    Finally, those public bodies set up by Parliament and reporting directly to the Speaker’s Committee are also excluded (for example, the Electoral Commission and IPSA).

    As agreed in the quarterly report of February 2013, the following sites have been included in the list:

    • ‘.independent’ sites
    • National parks

    Inclusion under department name

    Websites are listed under the department name for which the government minister has responsibility, either directly through their departmental activities, or indirectly through being the minister reporting to Parliament for independent bodies set up by statute.

    January 2014 report

    Government website domains have been procured from as early as the 1990s and at this time, there was no requirement upon government departments to retain a formal record of ownership. With staff changes and new departments formed, it became apparent that departments did not have a complete view of all sites in their estate.

    Government Digital Service (GDS) has worked closely with these departments to identify legacy websites which we were not originally aware of, by going through the complete list of gov.uk domains managed by Cabinet Office, under the second level domain (SLD), gov.uk. A full list of gov.uk domains can be viewed here. As well as websites on the gov.uk SLD, we had found that there are a number of legacy websites owned by departments under a .org.uk or co.uk SLD. Because we do not own these SLDs, information on whether a department has ownership was not so easily accessible, but a strong working relationship with department leads has since helped to identify the majority of these sites.

    Previously, the Ministry of Defence conducted their own rationalisation of MOD and the armed forces sites. At the beginning of this report, we agreed to include these sites to ensure a consistent approach.

    Since the last report of Oct 2013, 19 websites have closed and 18 have migrated to the governments website, GOV.UK. As government websites migrate to GOV.UK, the responsibility for reporting a department’s content will become an overall GOV.UK reporting

  13. Common languages used for web content 2025, by share of websites

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  14. U.S. most visited websites 2024, by total visits

    • statista.com
    • ai-chatbox.pro
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). U.S. most visited websites 2024, by total visits [Dataset]. https://www.statista.com/statistics/1456422/most-visited-websites-total-visits-united-states/
    Explore at:
    Dataset updated
    Mar 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2024
    Area covered
    United States
    Description

    In November 2024, Google.com was the most visited website in the United States, with over 25 billion total visits. YouTube.com came in second with 12 billion total visits. Reddit.com and Amazon.com counted approximately 3.12 billion and 2.89 monthly visits each from U.S. online audiences.

  15. D

    Most popular websites in the Netherlands 2015

    • ssh.datastations.nl
    csv, tsv, zip
    Updated May 9, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Kleppe; H. Bijleveld; M. Kleppe; H. Bijleveld (2017). Most popular websites in the Netherlands 2015 [Dataset]. http://doi.org/10.17026/DANS-X6H-6QQT
    Explore at:
    zip(15855), csv(138294), tsv(176359)Available download formats
    Dataset updated
    May 9, 2017
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    M. Kleppe; H. Bijleveld; M. Kleppe; H. Bijleveld
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Netherlands
    Dataset funded by
    NWO
    Description

    This dataset contains a list of 3654 Dutch websites that we considered the most popular websites in 2015. This list served as whitelist for the Newstracker Research project in which we monitored the online web behaviour of a group of respondents.The research project 'The Newstracker' was a subproject of the NWO-funded project 'The New News Consumer: A User-Based Innovation Project to Meet Paradigmatic Change in News Use and Media Habits'.For the Newstracker project we aimed to understand the web behaviour of a group of respondents. We created custom-built software to monitor their web browsing behaviour on their laptops and desktops (please find the code in open access at https://github.com/NITechLabs/NewsTracker). For reasons of scale and privacy we created a whitelist with websites that were the most popular websites in 2015. We manually compiled this list by using data of DDMM, Alexa and own research. The dataset consists of 5 columns:- the URL- the type of website: We created a list of types of websites and each website has been manually labeled with 1 category- Nieuws-regio: When the category was 'News', we subdivided these websites in the regional focus: International, National or Local- Nieuws-onderwerp: Furthermore, each website under the category News was further subdivided in type of news website. For this we created an own list of news categories and manually coded each website- Bron: For each website we noted which source we used to find this website.The full description of the research design of the Newstracker including the set-up of this whitelist is included in the following article: Kleppe, M., Otte, M. (in print), 'Analysing & understanding news consumption patterns by tracking online user behaviour with a multimodal research design', Digital Scholarship in the Humanities, doi 10.1093/llc/fqx030.

  16. d

    Health and Human Services Facilities List

    • catalog.data.gov
    • data.montgomerycountymd.gov
    • +2more
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.montgomerycountymd.gov (2023). Health and Human Services Facilities List [Dataset]. https://catalog.data.gov/dataset/health-and-human-services-facilities-list
    Explore at:
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    data.montgomerycountymd.gov
    Description

    List of Health and Human Services facilities and available programs, contact information, hours of operations and web-page links. This dataset is updated on an as needed basis.

  17. A

    Civil Service List (Terminated)

    • data.amerigeoss.org
    • data.cityofnewyork.us
    • +2more
    csv, json, rdf, xml
    Updated Jul 1, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2019). Civil Service List (Terminated) [Dataset]. https://data.amerigeoss.org/sv/dataset/civil-service-list-terminated
    Explore at:
    csv, rdf, xml, jsonAvailable download formats
    Dataset updated
    Jul 1, 2019
    Dataset provided by
    United States
    Description

    A Civil Service List is considered terminated usually four years after the list has been established, unless it is extended at the Commissioner’s discretion. For more information visit DCAS’ “Work for the City” webpage at: https://www1.nyc.gov/site/dcas/employment/take-an-exam.page.

  18. Leading websites worldwide 2024, by unique visitors

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading websites worldwide 2024, by unique visitors [Dataset]. https://www.statista.com/statistics/1201889/most-visited-websites-worldwide-unique-visits/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2024
    Area covered
    Worldwide
    Description

    In November 2024, Google.com was the most popular website worldwide with approximately 6.25 billion unique monthly visitors. YouTube.com was ranked second with an estimated 3.64 billion unique monthly visitors. Both websites are among the most visited websites worldwide.

  19. Data from: HTTPS traffic classification

    • kaggle.com
    Updated Mar 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Đinh Ngọc Ân (2024). HTTPS traffic classification [Dataset]. https://www.kaggle.com/datasets/inhngcn/https-traffic-classification/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 11, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Đinh Ngọc Ân
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The people from Czech are publishing a dataset for the HTTPS traffic classification.

    Since the data were captured mainly in the real backbone network, they omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).

    During research, they divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.

    They have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. They also used several popular websites that primarily focus on the audience in Czech. The identified traffic classes and their representatives are provided below:

    Live Video Stream Twitch, Czech TV, YouTube Live Video Player DailyMotion, Stream.cz, Vimeo, YouTube Music Player AppleMusic, Spotify, SoundCloud File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive Website and Other Traffic Websites from Alexa Top 1M list

  20. m

    Data for: Machine Learning based Heterogeneous Web Advertisements Detection...

    • data.mendeley.com
    • narcis.nl
    • +1more
    Updated Jun 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KS Kuppusamy (2018). Data for: Machine Learning based Heterogeneous Web Advertisements Detection Using a Diverse Feature Set [Dataset]. http://doi.org/10.17632/5bzh52txpn.1
    Explore at:
    Dataset updated
    Jun 29, 2018
    Authors
    KS Kuppusamy
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Advertisement identification and filtering in web pages gain significance due to various factors such as accessibility, security, privacy, and obtrusiveness. Current practices in this direction involve maintaining URL-based regular expressions called filter lists. Each URL obtained on a web page is matched against this filter list. While effectual, this procedure lacks scalability as it demands regular continuance of the filter list. To counter these limitations, we devise a machine learning based advertisement detection system using a diverse feature set which can distinguish advertisement blocks from non-advertisement blocks. The method can act as a base to provide various accessibility-related features like smooth browsing and text summarization for persons with visual impairments, cognitive impairments, and photosensitive epilepsy. The results from a classifier trained on the proposed feature set achieve 93.4% accuracy in identifying advertisements.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
WebTechSurvey (2020). Websites using WordPress [Dataset]. https://webtechsurvey.com/technology/wordpress

Websites using WordPress

Explore at:
50 scholarly articles cite this dataset (View in Google Scholar)
csvAvailable download formats
Dataset updated
Apr 4, 2020
Dataset authored and provided by
WebTechSurvey
License

https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

Time period covered
2025
Area covered
Global
Description

A complete list of live websites using the WordPress technology, compiled through global website indexing conducted by WebTechSurvey.

Search
Clear search
Close search
Google apps
Main menu