https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Alexa Internet was founded in April 1996 by Brewster Kahle and Bruce Gilliat. The company's name was chosen in homage to the Library of Alexandria of Ptolemaic Egypt, drawing a parallel between the largest repository of knowledge in the ancient world and the potential of the Internet to become a similar store of knowledge. (from Wikipedia)
The categories list was going out by September, 17h, 2020. So I would like to save it. https://support.alexa.com/hc/en-us/articles/360051913314
This dataset was elaborated by this python script (V2.0): https://github.com/natanael127/dump-alexa-ranking
The sites are grouped in 17 macro categories and this tree ends having more than 360.000 nodes. Subjects are very organized and each of them has its own rank of most accessed domains. So, even the keys of a sub-dictionary may be a good small dataset to use.
Thank you my friend André (https://github.com/andrerclaudio) by helping me with tips of Google Colaboratory and computational power to get the data until our deadline.
Alexa ranking was inspired by Library of Alexandria. In the modern world, it may be a good start for AI know more about many, many subjects of the world.
This dataset is composed of the URLs of the top 1 million websites. The domains are ranked using the Alexa traffic ranking which is determined using a combination of the browsing behavior of users on the website, the number of unique visitors, and the number of pageviews. In more detail, unique visitors are the number of unique users who visit a website on a given day, and pageviews are the total number of user URL requests for the website. However, multiple requests for the same website on the same day are counted as a single pageview. The website with the highest combination of unique visitors and pageviews is ranked the highest
Data from Fortune 500's 2023 ranking.
Includes data on top 1000 companies w/ additional info (Stock symbol/*ticker*, CEO name).
Update (New dataset): 2024 Fortune 1000 Companies
From Investopedia:
The Fortune 1000 is an annual list of the 1000 largest American companies maintained by the popular magazine Fortune Fortune ranks the eligible companies by revenue generated from core operations, discounted operations, and consolidated subsidiaries Since revenue is the basis for inclusion, every company is authorized to operate in the United States and files a 10-K or comparable financial statement with a government agency -- .
Fortune magazine publishes this list every year and some lists can be found from different sources. From looking at this year's available datasets, some features were missing or could not be found. This was built from scraping the standard features as well as what's included on Company Info (such as CEO, Ticker and website) from the Fortune magazine website. Details on how the data was generated can be found on this notebook where a few of the features were also visualized.
The source code from the 2023 fortune 500 Ranking includes 1000 companies. A reference page (slug) to additional info is included for each companies which were also scrapped to complete the dataset.
Available formats: csv, parquet
Features are follows:
[Note: References to datatypes are relevant when using the parquet file; Labels refer to the original website names]
This statistic illustrates the website performance index of the main national heritage sites in Italy as of 2018. According to the figures, the website of the Last Supper by Leonardo Da Vinci, in Milan, was the one with the best performance in Italy. Moreover, the website of the archeological site of Pompeii ranked second in the list, with a performance score of 91.
In December 2023, Amazon.com was the leading online shopping website in the United States. During the measured period, the sprawling platform accounted for over 45 percent of desktop traffic in the e-commerce and shopping subcategory. In second place on the list was eBay.com, with 9.22 percent of visitors. Walmart ranked third with a bit less than six percent of web traffic. Why customers browse on Amazon The main reason behind the outstanding online traffic to Amazon is user behavior throughout the customer journey. Amazon serves as a search engine for U.S. consumers, with 73 percent browsing it for inspiration and product discovery. Another 65 percent of U.S. shoppers landed on Amazon to look for products and compare products. In turn, Google is left third in the ranking of most used platforms. Generational differences In the beauty segment, the customer journey is more likely to start on Amazon among senior consumers. In the United States, 44 percent of Baby Boomers started their search of beauty products on the marketplace, while only 35 percent of Gen Z consumers reported doing the same.
Find any Shopify store in the world's detailed info. The data includes: - Store Name - Store URL - Product Distribution and Categories - Alexa Ranks - Number Of Products - Country (Please look into other products for the global lists) - Email and Contact Info - Social Media Outlets With URLs
Additional fields can be added via requirements as well.
In November 2024, Google.com held the top spot in India's website rankings, averaging over **** billion monthly visits. YouTube ranked second, with traffic of **** billion visits, while social platforms Instagram.com and Facebook.com followed with *** million and *** million monthly visits each. Internet penetration In the past decade, India has witnessed a remarkable transformation in its digital landscape. This substantial expansion has resulted in extensive digital connectivity, with more than **** of India's *** billion citizens now enjoying internet access. India ranked **** on the Digital Quality of Life Index in 2023, which revealed electronic infrastructure as one of the country’s strengths. YouTube in India As of 2025, India had the world’s largest YouTube user base, figuring over *** million users. The video platform caters to the nation’s tech-savvy denizens as an educational resource and a source of entertainment. Moreover, YouTube has evolved into a dynamic space for digital marketing, especially harnessing the consumer base segment aged below 32 years.
Academic journals indicators developed from the information contained in the Scopus database (Elsevier B.V.). These indicators can be used to assess and analyze scientific domains.
This is a list of cleanup sites in Washington State. It includes sites and associated websites. It includes location data, Cleanup Status, Site Rank – if the site is ranked, and if the site has an Environmental Covenant.
Over half the cleanup sites have a status of “No Further Action Required/Decision” or NFA. If a site has a NFA it includes the latest NFA date and NFA reason.
The Washington Department of Ecology (Toxics Cleanup Program) works to clean up contaminated sites/properties throughout the state of Washington. This data was downloaded from the Integrated Site Information System (ISIS) database and is monthly.
This is a list of cleanup sites in Washington State. It includes sites and associated websites. It includes location data, Cleanup Status, Site Rank – if the site is ranked, and if the site has an Environmental Covenant.
Over half the cleanup sites have a status of “No Further Action Required/Decision” or NFA. If a site has a NFA it includes the latest NFA date and NFA reason.
The Washington Department of Ecology (Toxics Cleanup Program) works to clean up contaminated sites/properties throughout the state of Washington. This data was downloaded from the Integrated Site Information System (ISIS) database and is monthly.
This is a list of cleanup sites in Washington State. It includes sites and associated websites. It includes location data, Cleanup Status, Site Rank – if the site is ranked, and if the site has an Environmental Covenant.
Over half the cleanup sites have a status of “No Further Action Required/Decision” or NFA. If a site has a NFA it includes the latest NFA date and NFA reason.
The Washington Department of Ecology (Toxics Cleanup Program) works to clean up contaminated sites/properties throughout the state of Washington. This data was downloaded from the Integrated Site Information System (ISIS) database and is monthly.
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the Simple Site Rating technology, compiled through global website indexing conducted by WebTechSurvey.
In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.
The Easiest Way to Collect Data from the Internet Download anything you see on the internet into spreadsheets within a few clicks using our ready-made web crawlers or a few lines of code using our APIs
We have made it as simple as possible to collect data from websites
Easy to Use Crawlers Amazon Product Details and Pricing Scraper Amazon Product Details and Pricing Scraper Get product information, pricing, FBA, best seller rank, and much more from Amazon.
Google Maps Search Results Google Maps Search Results Get details like place name, phone number, address, website, ratings, and open hours from Google Maps or Google Places search results.
Twitter Scraper Twitter Scraper Get tweets, Twitter handle, content, number of replies, number of retweets, and more. All you need to provide is a URL to a profile, hashtag, or an advance search URL from Twitter.
Amazon Product Reviews and Ratings Amazon Product Reviews and Ratings Get customer reviews for any product on Amazon and get details like product name, brand, reviews and ratings, and more from Amazon.
Google Reviews Scraper Google Reviews Scraper Scrape Google reviews and get details like business or location name, address, review, ratings, and more for business and places.
Walmart Product Details & Pricing Walmart Product Details & Pricing Get the product name, pricing, number of ratings, reviews, product images, URL other product-related data from Walmart.
Amazon Search Results Scraper Amazon Search Results Scraper Get product search rank, pricing, availability, best seller rank, and much more from Amazon.
Amazon Best Sellers Amazon Best Sellers Get the bestseller rank, product name, pricing, number of ratings, rating, product images, and more from any Amazon Bestseller List.
Google Search Scraper Google Search Scraper Scrape Google search results and get details like search rank, paid and organic results, knowledge graph, related search results, and more.
Walmart Product Reviews & Ratings Walmart Product Reviews & Ratings Get customer reviews for any product on Walmart.com and get details like product name, brand, reviews, and ratings.
Scrape Emails and Contact Details Scrape Emails and Contact Details Get emails, addresses, contact numbers, social media links from any website.
Walmart Search Results Scraper Walmart Search Results Scraper Get Product details such as pricing, availability, reviews, ratings, and more from Walmart search results and categories.
Glassdoor Job Listings Glassdoor Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Glassdoor.
Indeed Job Listings Indeed Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Indeed.
LinkedIn Jobs Scraper Premium LinkedIn Jobs Scraper Scrape job listings on LinkedIn and extract job details such as job title, job description, location, company name, number of reviews, and more.
Redfin Scraper Premium Redfin Scraper Scrape real estate listings from Redfin. Extract property details such as address, price, mortgage, redfin estimate, broker name and more.
Yelp Business Details Scraper Yelp Business Details Scraper Scrape business details from Yelp such as phone number, address, website, and more from Yelp search and business details page.
Zillow Scraper Premium Zillow Scraper Scrape real estate listings from Zillow. Extract property details such as address, price, Broker, broker name and more.
Amazon product offers and third party sellers Amazon product offers and third party sellers Get product pricing, delivery details, FBA, seller details, and much more from the Amazon offer listing page.
Realtor Scraper Premium Realtor Scraper Scrape real estate listings from Realtor.com. Extract property details such as Address, Price, Area, Broker and more.
Target Product Details & Pricing Target Product Details & Pricing Get product details from search results and category pages such as pricing, availability, rating, reviews, and 20+ data points from Target.
Trulia Scraper Premium Trulia Scraper Scrape real estate listings from Trulia. Extract property details such as Address, Price, Area, Mortgage and more.
Amazon Customer FAQs Amazon Customer FAQs Get FAQs for any product on Amazon and get details like the question, answer, answered user name, and more.
Yellow Pages Scraper Yellow Pages Scraper Get details like business name, phone number, address, website, ratings, and more from Yellow Pages search results.
From September to November 2023, internet users from France visiting search platform Google.fr saw an average of 3.5 pages per visit. Leading the list, the adult video website Xvideos.com held the top rank, with users viewing an average of 8.3 pages each session. The social video platform YouTube.com came in third, with an average of 5.2 pages per visit, while the e-commerce platform Amazon.fr, which ranked fifth, averaged 4.9 pages per session.
In November 2024, Google.com was the most popular website worldwide with approximately 6.25 billion unique monthly visitors. YouTube.com was ranked second with an estimated 3.64 billion unique monthly visitors. Both websites are among the most visited websites worldwide.
In May 2025, booking.com was the most visited travel and tourism website worldwide. That month, Booking’s web page recorded around *** million visits. Tripadvisor.com and airbnb.com followed in the ranking, with roughly *** million and ** million visits, respectively. Popular online travel agencies in the U.S. Online travel agencies (OTAs), such as Booking.com and Expedia, offer a wide variety of services, including online hotel bookings, flight reservations, and car rentals. According to the Statista Consumer Insights Global survey, when looking at flight search engine online bookings by brand in the United States, Booking.com and Expedia were the most popular options when it came to making online flight reservations in 2025. When focusing on hotel and private accommodation online bookings in the U.S., Booking.com was again the most popular brand, followed by Airbnb, Expedia, and Hotels.com. Booking Holdings vs. Expedia Group Booking.com is one of the most popular sites of the online travel group Booking Holdings, the leading online travel agency worldwide based on revenue, that also owns brands like Priceline, Kayak, and Agoda. In 2024, Booking Holdings' revenue amounted to almost ** billion U.S. dollars, the highest figure reported by the company to date. Meanwhile, global revenue of Expedia Group, which manages brands like Expedia, Hotels.com, and Vrbo, reached nearly ** billion U.S. dollars that year.
From September to November 2023, search platform Google.com was the top ranking website in Canada, with average monthly traffic of almost four billion visits. YouTube ranked second with almost three billion visits. Social network Facebook.com ranked third, with total monthly traffic of 470 million visits.
As of 2024, WordPress.org is the leading website builder in the world, accounting for over ** percent of global market share. Wix and Squarespace ranked as the second and third most popular website building platforms, each of which accounted for ** and *** percent respectively of market share. Website builders Website building tools such as Wix and Squarespace allow for users to construct and manage customizable websites without the need for advanced coding knowledge. These platforms often include customizable design templates and offer extensions for e-commerce and mailing lists. Although Wix has the biggest worldwide market share, Weebly and Squarespace rank as the most popular platforms in the United States. As the functionality offered by these platforms increases, so too does the market’s overall revenue figure. Between 2012 and 2017 website builder revenue increased from around ****billion U.S. dollars to over *** billion dollars. As overall web traffic and global internet access continue to rise, it has become increasingly important for businesses to have an online site for sales, marketing, and general contact information. These website building tools allow businesses of all sizes to maintain an online presence without having to spend huge sums of money on web design or coding.
In November 2024, Google.com was the leading website in the United Kingdom with more than 4.16 billion monthly visits. The search engine was also popular in its UK top-level domain, with Google.co.uk reaching 267 million views and placing tenth in the ranking. YouTube and Facebook were the most visited social media platforms, ranking as the second and fifth most visited websites in the country.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Alexa Internet was founded in April 1996 by Brewster Kahle and Bruce Gilliat. The company's name was chosen in homage to the Library of Alexandria of Ptolemaic Egypt, drawing a parallel between the largest repository of knowledge in the ancient world and the potential of the Internet to become a similar store of knowledge. (from Wikipedia)
The categories list was going out by September, 17h, 2020. So I would like to save it. https://support.alexa.com/hc/en-us/articles/360051913314
This dataset was elaborated by this python script (V2.0): https://github.com/natanael127/dump-alexa-ranking
The sites are grouped in 17 macro categories and this tree ends having more than 360.000 nodes. Subjects are very organized and each of them has its own rank of most accessed domains. So, even the keys of a sub-dictionary may be a good small dataset to use.
Thank you my friend André (https://github.com/andrerclaudio) by helping me with tips of Google Colaboratory and computational power to get the data until our deadline.
Alexa ranking was inspired by Library of Alexandria. In the modern world, it may be a good start for AI know more about many, many subjects of the world.