As of December 2024, there were in total around **** million websites registered in China. This represent an increase from around **** million by the end of 2023.
In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.
The total number of visitors to government websites in the last minute.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data consist of a collection of legitimate as well as phishing website instances. Each website is represented by the set of features which denote, whether website is legitimate or not. Data can serve as an input for machine learning process.
Here, the two variants of the Phishing Dataset are presented.
Full variant - dataset_full.csv
Small variant - dataset_small.csv
Mobile accounts for approximately half of web traffic worldwide. In the last quarter of 2024, mobile devices (excluding tablets) generated 62.54 percent of global website traffic. Mobiles and smartphones consistently hoovered around the 50 percent mark since the beginning of 2017, before surpassing it in 2020. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.
As many general retailers or mass distribution channels experienced an exponential growth during the months of the COVID-19 induced lockdown in France, the source wanted to measure the total number of backlinks on the different retailers websites. Thus, Carrefour.fr was the leading general retailer with the most backlinks amounting to ***** on their website. A strategy of acquiring backlinks which therefore seems to be paying off for the major retailer, which drew around ***** percent of its overall traffic through this means.
In Saudi Arabia, the most visited website as of November 2024 was Google.com, with around **** billion monthly visitors. The second most-visited website was YouTube.com, with *** billion monthly visits in total.
This dataset is composed of the URLs of the top 1 million websites. The domains are ranked using the Alexa traffic ranking which is determined using a combination of the browsing behavior of users on the website, the number of unique visitors, and the number of pageviews. In more detail, unique visitors are the number of unique users who visit a website on a given day, and pageviews are the total number of user URL requests for the website. However, multiple requests for the same website on the same day are counted as a single pageview. The website with the highest combination of unique visitors and pageviews is ranked the highest
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS certificate fields, and GeoIP information for 432,572 verified benign domains from Cisco Umbrella and 36,993 verified phishing domains from PhishTank and OpenPhish services. The dataset is useful for statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing detection. The data was collected between March and July 2023.The final assessment of the data was conducted in July 2023 (this is why the names are suffixed with _2307).
The upload contains: a) data files, b) the description of the data structure, and c) the veature vector we used for ML-based phishing domain detection.
The data is located in two individual files:
Both files are in the JSON Array format. The structure is as follows:
[
{
"_id" : "A unique ID of the data record",
"domain_name" : "Name of the domain (e.g., zenodo.com)",
"dns" : { "//": "Data obtained from DNS records" },
"evaluated_on" : "// ISO Timestamp of data collection ",
"ip_data" : [ "// Data for each related IP adddress ",
{
"//": "IP-related data, including RTT from ICMP echo attempts (from Brno, Czechia)",
"//": "WHOIS/RDAP data for the given IP address",
"//": "GeoIP data for the given IP address",
"//": "NERD system reputation score (if available)",
"//": "ASN info",
"//": "remarks: ISO timestamps of collection of the individual data pieces"
},
],
"label" : "benign_2307 for benign OR misp_2307 for phishing",
"rdap" : { "//": "WHOIS/RDAP information for the domain name" },
"remarks" : {
"dns_evaluated_on" : "ISO Timestamp of DNS data collection",
"rdap_evaluated_on" : "ISO Timestamp of WHOIS/RDAP data collection",
"tls_evaluated_on" : "ISO Timestamp of TLS certificate information collection",
"dns_had_no_ips" : "true if no IPs were found in DNS records"
},
"sourced_on" : "ISO Timestamp of the moment the domain was found",
"tls" : {
"cipher" : "Identifier of the TLS cipher suite",
"count" : "Number of certificates in chain",
"protocol" : "Version of the TLS protocol",
"certificates" : [
"//": "Information from TLS certificate fields: issuer, extensions, etc."
]
},
"category" : "Category of the record (could be ignored)",
"source" : "Name of the file that we used to save the domain list"
}
]
This section describes the veature vector used in the "Unmasking the Phishermen: Phishing Domain Detection with Machine Learning and Multi-Source Intelligence" paper that was accepted to the IEEE NOMS 2024 conference.
The following features were extracted from the sole domain name:
The following features were extracted from DNS responses when querying about the domain:
These features were derived from IP addresses and ICMP echo replies:
The following features were extracted from TLS certificate chains and TLS handshakes:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here is a list of the top content management systems available right now and their total marketing share.
As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset records the total number of domain names registered, by calendar year, up to the end of 2019.
As of 2025, there are about 24 million eCommerce sites worldwide—a drop from the previous high of 27 million but still far above the 9.2 million recorded in 2019. The United States alone accounts for nearly 12 million online stores, underlining the global shift to digital commerce.
The dataset contains Normal, DGA and Tunneling domain names: i. the total number of normal domains are conformed by the Alexa top one million domains, 3,161 normal domains provided by the Bambenek Consulting feed, and another 177,017 normal domains; ii. the DGA domains were obtained from the repositories of DGA domains of Andrey Abakumov and John Bambenek, corresponding to 51 different malware families; iii. the DNS Tunneling consist of 8000 tunnel domains generated using a set of well known DNS tunneling tools under laboratory conditions: iodine, dnscat2 and dnsExfiltrator.
The dataset is described in the paper:
Palau, F., Catania, C., Guerra, J., García, S. J., & Rigaki, M. (2019). Detecting DNS threats: A deep learning model to rule them all. In XX Simposio Argentino de Inteligencia Artificial (ASAI 2019)-JAIIO 48 (Salta).
The economics news website Boursorama.com topped the ranking as the most visited online economics and legal newspaper as of July 2024 in France, with a total number of visits exceeding 41.62 million visits. The websites LesEchos.fr and Capital.fr came in second and third positions, with around 26 and 25 million visits respectively in France in July 2024.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset contains a weekly situation update on COVID-19, the epidemiological curve and the global geographical distribution (EU/EEA and the UK, worldwide).
Since the beginning of the coronavirus pandemic, ECDC’s Epidemic Intelligence team has collected the number of COVID-19 cases and deaths, based on reports from health authorities worldwide. This comprehensive and systematic process was carried out on a daily basis until 14/12/2020. See the discontinued daily dataset: COVID-19 Coronavirus data - daily. ECDC’s decision to discontinue daily data collection is based on the fact that the daily number of cases reported or published by countries is frequently subject to retrospective corrections, delays in reporting and/or clustered reporting of data for several days. Therefore, the daily number of cases may not reflect the true number of cases at EU/EEA level at a given day of reporting. Consequently, day to day variations in the number of cases does not constitute a valid basis for policy decisions.
ECDC continues to monitor the situation. Every week between Monday and Wednesday, a team of epidemiologists screen up to 500 relevant sources to collect the latest figures for publication on Thursday. The data screening is followed by ECDC’s standard epidemic intelligence process for which every single data entry is validated and documented in an ECDC database. An extract of this database, complete with up-to-date figures and data visualisations, is then shared on the ECDC website, ensuring a maximum level of transparency.
ECDC receives regular updates from EU/EEA countries through the Early Warning and Response System (EWRS), The European Surveillance System (TESSy), the World Health Organization (WHO) and email exchanges with other international stakeholders. This information is complemented by screening up to 500 sources every day to collect COVID-19 figures from 196 countries. This includes websites of ministries of health (43% of the total number of sources), websites of public health institutes (9%), websites from other national authorities (ministries of social services and welfare, governments, prime minister cabinets, cabinets of ministries, websites on health statistics and official response teams) (6%), WHO websites and WHO situation reports (2%), and official dashboards and interactive maps from national and international institutions (10%). In addition, ECDC screens social media accounts maintained by national authorities on for example Twitter, Facebook, YouTube or Telegram accounts run by ministries of health (28%) and other official sources (e.g. official media outlets) (2%). Several media and social media sources are screened to gather additional information which can be validated with the official sources previously mentioned. Only cases and deaths reported by the national and regional competent authorities from the countries and territories listed are aggregated in our database.
Disclaimer: National updates are published at different times and in different time zones. This, and the time ECDC needs to process these data, might lead to discrepancies between the national numbers and the numbers published by ECDC. Users are advised to use all data with caution and awareness of their limitations. Data are subject to retrospective corrections; corrected datasets are released as soon as processing of updated national data has been completed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here is a list of the top 10 subcategories for WordPress.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was obtained from UCI machine learning repository in 2019. The dataset consists of eleven thousand and fifty-five (11055) instances with thirty-one (31) attributes and does not contain any missing value whatsoever. The dataset has two decisional conditions (that is, class labels); thus: Phishing is -1 and non-phishing is 1. Of the total 11055 instances, the total number occurrence of instances in the phishing class is 4898, while the non-phishing class contains 6157 total instances.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
\\\\A dataset of various news articles scraped from different online news agencies’ websites. The total number of articles is 16,438, spread over eight different classes.
This statistic displays the leading themes used by Italian sites built on WordPress as of October 2019. According to the data, the most popular themes were Divi and Avada, accounting for **** and ***** percent of the total number of websites, respectively.
As of December 2024, there were in total around **** million websites registered in China. This represent an increase from around **** million by the end of 2023.