This dataset is composed of the URLs of the top 1 million websites. The domains are ranked using the Alexa traffic ranking which is determined using a combination of the browsing behavior of users on the website, the number of unique visitors, and the number of pageviews. In more detail, unique visitors are the number of unique users who visit a website on a given day, and pageviews are the total number of user URL requests for the website. However, multiple requests for the same website on the same day are counted as a single pageview. The website with the highest combination of unique visitors and pageviews is ranked the highest
Google.com was the website with the most page views per day in Bolivia in February 2022, according to ranking by Alexa. The website had more than 18.49 daily page views and was followed by Unitel.bo, with 11 page views per day that month. Within Latin America, Mexico was the country where Amazon Alexa contained the largest number of skills.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Alexa Internet was founded in April 1996 by Brewster Kahle and Bruce Gilliat. The company's name was chosen in homage to the Library of Alexandria of Ptolemaic Egypt, drawing a parallel between the largest repository of knowledge in the ancient world and the potential of the Internet to become a similar store of knowledge. (from Wikipedia)
The categories list was going out by September, 17h, 2020. So I would like to save it. https://support.alexa.com/hc/en-us/articles/360051913314
This dataset was elaborated by this python script (V2.0): https://github.com/natanael127/dump-alexa-ranking
The sites are grouped in 17 macro categories and this tree ends having more than 360.000 nodes. Subjects are very organized and each of them has its own rank of most accessed domains. So, even the keys of a sub-dictionary may be a good small dataset to use.
Thank you my friend André (https://github.com/andrerclaudio) by helping me with tips of Google Colaboratory and computational power to get the data until our deadline.
Alexa ranking was inspired by Library of Alexandria. In the modern world, it may be a good start for AI know more about many, many subjects of the world.
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Explore the historical Whois records related to alexa-ranking.com (Domain). Get insights into ownership history and changes over time.
In 2019, the Chinese marketplace Alibaba was the leading worldwide B2B e-commerce in terms of online traffic. The Alexa tool assessing the online traffic of websites put it on the top of the ranking, with a score of 177. The Russian Rosfirm and the U.S. platform Vinsuite followed in the ranking with a score of 1,047 and 1.137, respectively.
Traffic analytics, rankings, and competitive metrics for alexa.com as of May 2025
This dataset was created by DNS_dataset
From September to November 2023, YouTube.com was the most popular website in Spain by time per visit, with an average session length of approximately 32 minutes and 16 seconds. AnimeFLV.net ranked second, with an average of 32 minutes and six seconds per visit. Despite being the leading website by total visits and unique visitors in the country, Google.com ranked third in engagement time, with 21 minutes and 13 seconds per session.
This statistic shows the results of a 2014 Popsugar survey among American women asking them how important an appealing design is for a website that shows online visual content. During the survey, 9.5 percent of female respondents ranked it at the top.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is designed to aid in the analysis and detection of phishing websites. It contains various features that help distinguish between legitimate and phishing websites based on their structural, security, and behavioral attributes.
Result
(Indicates whether a website is phishing or legitimate) Prefix_Suffix
– Checks if the URL contains a hyphen (-
), which is commonly used in phishing domains. double_slash_redirecting
– Detects if the URL redirects using //
, which may indicate a phishing attempt. having_At_Symbol
– Identifies the presence of @
in the URL, which can be used to deceive users. Shortining_Service
– Indicates whether the URL uses a shortening service (e.g., bit.ly, tinyurl). URL_Length
– Measures the length of the URL; phishing URLs tend to be longer. having_IP_Address
– Checks if an IP address is used in place of a domain name, which is suspicious. having_Sub_Domain
– Evaluates the number of subdomains; phishing sites often have excessive subdomains. SSLfinal_State
– Indicates whether the website has a valid SSL certificate (secure connection). Domain_registeration_length
– Measures the duration of domain registration; phishing sites often have short lifespans. age_of_domain
– The age of the domain in days; older domains are usually more trustworthy. DNSRecord
– Checks if the domain has valid DNS records; phishing domains may lack these. Favicon
– Determines if the website uses an external favicon (which can be a sign of phishing). port
– Identifies if the site is using suspicious or non-standard ports. HTTPS_token
– Checks if "HTTPS" is included in the URL but is used deceptively. Request_URL
– Measures the percentage of external resources loaded from different domains. URL_of_Anchor
– Analyzes anchor tags (<a>
links) and their trustworthiness. Links_in_tags
– Examines <meta>
, <script>
, and <link>
tags for external links. SFH
(Server Form Handler) – Determines if form actions are handled suspiciously. Submitting_to_email
– Checks if forms submit data directly to an email instead of a web server. Abnormal_URL
– Identifies if the website’s URL structure is inconsistent with common patterns. Redirect
– Counts the number of redirects; phishing websites may have excessive redirects. on_mouseover
– Checks if the website changes content when hovered over (used in deceptive techniques). RightClick
– Detects if right-click functionality is disabled (phishing sites may disable it). popUpWindow
– Identifies the presence of pop-ups, which can be used to trick users. Iframe
– Checks if the website uses <iframe>
tags, often used in phishing attacks. web_traffic
– Measures the website’s Alexa ranking; phishing sites tend to have low traffic. Page_Rank
– Google PageRank score; phishing sites usually have a low PageRank. Google_Index
– Checks if the website is indexed by Google (phishing sites may not be indexed). Links_pointing_to_page
– Counts the number of backlinks pointing to the website. Statistical_report
– Uses external sources to verify if the website has been reported for phishing. Result
– The classification label (1: Legitimate, -1: Phishing) This dataset is valuable for:
✅ Machine Learning Models – Developing classifiers for phishing detection.
✅ Cybersecurity Research – Understanding patterns in phishing attacks.
✅ Browser Security Extensions – Enhancing anti-phishing tools.
The statistic shows the distribution of Amazon Alexa skill ratings worldwide as of 2018. Around 61 percent of all Amazon Alexa skills have received zero ratings.
When you’re at work or at home, there’s a high chance that you’re going to use Google. You may be using Google to find a plumber for your leaky bathroom sink or see where the best sushi is in town. When you’re on Google, you’re looking for the top results which means you’re not scrolling past page one, unless, you’re desperate. So, getting your company on the first page of Google is extremely important and impactful for success. But, how do you get your business there? Well, here’s how.
Know the Basics
Before you do anything, you need to know the basics of how online marketing and Google search functions. By knowing the basics, you won’t waste time performing outdated tasks or being overcharged by an SEO Agency http://www.whitehatagency.com.au/seo-agency or marketing companies that recognize your lack of knowledge. Education is the key to success.
Use SEO
Search Engine Optimization is the method of attracting online attention and visibility through organic means. In essence, the unpaid search results - the paid search results typically have “sponsored” or “paid advertisement” written below them. But you can naturally drive traffic towards your site just by using the right keywords. Certain keywords will push your content, allowing it to be shown in the top results.
Meet the Google Standards
Google has standards which your website must fulfil prior to appearing as a #1 website in the search results. Google will flag any errors they deem needed fixed and you’re going to want to fix them, for example, broken links. If you don’t meet the standards, they’ll penalize all your pages until you fix them. So, take some time out of your day and make sure your website is fully functioning.
Content is everything
You may invest some top dollars in the look and appeal of your site but at the end of the day, it doesn’t really matter what your site looks like. What truly matters is the content as that will drive viewers to your site. The Google search results are designed to provide users with the most relevant material on the web. If your content isn’t providing value to the viewers, your material won’t make it to #1.
Focus on links
Links play an important role when it comes to Google’s ranking system. The way Google works is that it pays attention to the hyperlinks in content to figure out what keywords are tied to the link being used. Though this doesn’t mean your entire article should be made of links, if you use too many they’ll deem it as suspicious activity and your website can be taken from Google.
Google loves mobile-friendly
If you want to come up with a #1 site then you need to show Google that you’re updated and relevant to the current technology. In other words, you need to make your site mobile-friendly. Many users read material while on their way to work, on the bus or on their couch. So, if you’re not catering to smartphones, well, Google isn’t going to favor you.
Lebrau, C. (2020). 6 Ways to Get on the First Page of Google, HydroShare, http://www.hydroshare.org/resource/5b487a7dc6104628b10c2b6921b595e1
In November 2024, Google.com held the top spot in India's website rankings, averaging over **** billion monthly visits. YouTube ranked second, with traffic of **** billion visits, while social platforms Instagram.com and Facebook.com followed with *** million and *** million monthly visits each. Internet penetration In the past decade, India has witnessed a remarkable transformation in its digital landscape. This substantial expansion has resulted in extensive digital connectivity, with more than **** of India's *** billion citizens now enjoying internet access. India ranked **** on the Digital Quality of Life Index in 2023, which revealed electronic infrastructure as one of the country’s strengths. YouTube in India As of 2025, India had the world’s largest YouTube user base, figuring over *** million users. The video platform caters to the nation’s tech-savvy denizens as an educational resource and a source of entertainment. Moreover, YouTube has evolved into a dynamic space for digital marketing, especially harnessing the consumer base segment aged below 32 years.
This ranking report attempts to identify the best law school home pages based exclusively on objective criteria. The goal is to assess elements that make websites easier to use for sighted as well as visually-impaired users. Most elements require no special design skills, sophisticated technology or significant expenses. Ranking results in this report represent reasonably relevant elements. In this report, 200 ABA-accredited law school home pages are analyzed and ranked for twenty elements in three broad categories: Design Patterns & Metadata; Accessibility & Validation; and Marketing & Communications. As was the case in 2009, there is still no objective way to account for good taste. For interpreting these results, we don't try to decide if any whole is greater or less than the sum of its parts.
As of May 2019, Google had the highest reach among services web properties. The website is currently among the top 10 websites in Vietnam in the Alexa ranking.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The numbers in the parentheses are the ranking orders according to the focus indicators.Top 15 websites with highest PageRank.
Data set of visits to the website of the 2017 edition of the CWTS Leiden Ranking (www.leidenranking.com/ranking/2017) in the period between May 17, 2017 and February 28, 2018.
This is a list of cleanup sites in Washington State. It includes sites and associated websites. It includes location data, Cleanup Status, Site Rank – if the site is ranked, and if the site has an Environmental Covenant.
Over half the cleanup sites have a status of “No Further Action Required/Decision” or NFA. If a site has a NFA it includes the latest NFA date and NFA reason.
The Washington Department of Ecology (Toxics Cleanup Program) works to clean up contaminated sites/properties throughout the state of Washington. This data was downloaded from the Integrated Site Information System (ISIS) database and is monthly.
Data from Fortune 500's 2023 ranking.
Includes data on top 1000 companies w/ additional info (Stock symbol/*ticker*, CEO name).
Update (New dataset): 2024 Fortune 1000 Companies
From Investopedia:
The Fortune 1000 is an annual list of the 1000 largest American companies maintained by the popular magazine Fortune Fortune ranks the eligible companies by revenue generated from core operations, discounted operations, and consolidated subsidiaries Since revenue is the basis for inclusion, every company is authorized to operate in the United States and files a 10-K or comparable financial statement with a government agency -- .
Fortune magazine publishes this list every year and some lists can be found from different sources. From looking at this year's available datasets, some features were missing or could not be found. This was built from scraping the standard features as well as what's included on Company Info (such as CEO, Ticker and website) from the Fortune magazine website. Details on how the data was generated can be found on this notebook where a few of the features were also visualized.
The source code from the 2023 fortune 500 Ranking includes 1000 companies. A reference page (slug) to additional info is included for each companies which were also scrapped to complete the dataset.
Available formats: csv, parquet
Features are follows:
[Note: References to datatypes are relevant when using the parquet file; Labels refer to the original website names]
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
What is a high quality website? Over the years the whole SEO industry is talking about the need of producing high quality content and top experts came up with the clever quote ‘Content is king’, meaning that content is the success factor of any website. While this is true, does it mean that a website with good content is also a high quality website? The answer is NO. Good content is not enough. It is one of the factors (the most important) that separates low from high quality sites but good content alone does not complete the puzzle of what is considered by Google as a high quality website. Now you can get the high quality on high quality sites like Nytimes, Zeebiz, Zeenews.india, Forbes etc. You can also buy Zeebiz guest Post at a reasonable price from the best guest post service. What is SEO SEO is short for ‘Search Engine Optimization’. It refers to the process of increasing a websites traffic flow by optimizing several aspects of a website; such as your on-page SEO, technical SEO & off-site SEO,. Your SEO strategy should ideally be planned around your content strategy. For this you will require three elements, 1.) keywords, 2.) links and 3.) substance to piece your content strategy together. Guest Post on High quality sites can improve your SEO ranking. To improve ranking and boost ranking, buy Guest Post on Zeebiz from the high quality guest post service. Characteristics of a high quality website A high quality website has the following characteristics: Unique content Content is unique both within the website itself (i.e. each page has unique content and not similar to other pages), but also compared to other websites. Demonstrate Expertise Content is produced by experts based on research and or experience. If for example the subject is health related, then the advice should be provided by qualified authors who can professionally give advice for the particular subject. Unbiased content Content is detail and describes both sides of a story and is not promoting a single product, idea or service. Accessibility A high quality website has versions for non PC users as well. It is important that mobile and tablet users can access the website without any usability issues. Usability Can the user navigate the website easily; is the website user friendly? Attention to detail Content is easy to read with images (if applicable) and free of spelling and grammar mistakes. Does it seem that the owner cares on what is published on the website or is it for the purpose of having content in order to run ads? SEO Optimized Optimizing a web site for search engines has many benefits but it is important not to overdo it. A good quality web site needs to have non-optimized content as well. This is my opinion and although some people may disagree it is a fact that over-optimization can sometimes generate the opposite results. The reason is that algorithms can sometimes interpret over-optimization as an attempt to game the system and they may take measures to prevent this from happening. Balance between content and ads It is not something bad for a website to have ads or promotions but these should not distract the users from finding the information they need. Speed A high quality website loads fast. A fast website will rank higher and create more conventions, sales and loyal readers. Social Social media changed our lives, the way we communicate but also the way we assess quality. It is expected for a good product to have good reviews, Facebook likes and Tweets. Before you make a decision to buy or not, you may examine these social factors as well. Likewise, It is also expected for a good website to be socially accepted and recognized i.e. have Facebook followers, RSS subscribers etc. User Engagement and Interaction Do users spend enough time on the site and read more than one pages before they leave? Do they interact with the content by adding comments, making suggestions, getting into conversations etc.? Better than the competition When you take a specific keyword, is your website better than your competitors? Does it deserve one of the top positions if judged without bias?
This dataset is composed of the URLs of the top 1 million websites. The domains are ranked using the Alexa traffic ranking which is determined using a combination of the browsing behavior of users on the website, the number of unique visitors, and the number of pageviews. In more detail, unique visitors are the number of unique users who visit a website on a given day, and pageviews are the total number of user URL requests for the website. However, multiple requests for the same website on the same day are counted as a single pageview. The website with the highest combination of unique visitors and pageviews is ranked the highest