100+ datasets found

Traces captured by visiting the top 1500 website
kaggle.com
zip
Updated Aug 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DNS_dataset (2021). Traces captured by visiting the top 1500 website [Dataset]. https://www.kaggle.com/jacksontang16/traces-captured-by-visiting-the-top-1500-website
Explore at:
zip(5852806 bytes)Available download formats
Dataset updated
Aug 25, 2021
Authors
DNS_dataset
Description
Dataset

This dataset was created by DNS_dataset

Contents
Most visited websites by hierachycal categories
kaggle.com
Updated Sep 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natanael de Souza Figueiredo (2020). Most visited websites by hierachycal categories [Dataset]. https://www.kaggle.com/natanael127/most-visited-websites-by-hierachycal-categories/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Natanael de Souza Figueiredo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Alexa Internet was founded in April 1996 by Brewster Kahle and Bruce Gilliat. The company's name was chosen in homage to the Library of Alexandria of Ptolemaic Egypt, drawing a parallel between the largest repository of knowledge in the ancient world and the potential of the Internet to become a similar store of knowledge. (from Wikipedia)

The categories list was going out by September, 17h, 2020. So I would like to save it. https://support.alexa.com/hc/en-us/articles/360051913314

This dataset was elaborated by this python script (V2.0): https://github.com/natanael127/dump-alexa-ranking

Content

The sites are grouped in 17 macro categories and this tree ends having more than 360.000 nodes. Subjects are very organized and each of them has its own rank of most accessed domains. So, even the keys of a sub-dictionary may be a good small dataset to use.

Acknowledgements

Thank you my friend André (https://github.com/andrerclaudio) by helping me with tips of Google Colaboratory and computational power to get the data until our deadline.

Inspiration

Alexa ranking was inspired by Library of Alexandria. In the modern world, it may be a good start for AI know more about many, many subjects of the world.
n
(Dataset) The most visited health websites in the world
narcis.nl
data.mendeley.com
Updated Jan 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Acosta-Vargas, P (via Mendeley Data) (2021). (Dataset) The most visited health websites in the world [Dataset]. http://doi.org/10.17632/n468trh5my.1
Explore at:
Unique identifier
https://doi.org/10.17632/n468trh5my.1
Dataset updated
Jan 11, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Acosta-Vargas, P (via Mendeley Data)
Description
Evaluation of the most visited health websites in the world
P
Alexa Domains Dataset
paperswithcode.com
opendatalab.com
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isaac Corley; Jonathan Lwowski; Justin Hoffman (2001). Alexa Domains Dataset [Dataset]. https://paperswithcode.com/dataset/gagan-bhatia
Explore at:
Dataset updated
Feb 1, 2001
Authors
Isaac Corley; Jonathan Lwowski; Justin Hoffman
Description
This dataset is composed of the URLs of the top 1 million websites. The domains are ranked using the Alexa traffic ranking which is determined using a combination of the browsing behavior of users on the website, the number of unique visitors, and the number of pageviews. In more detail, unique visitors are the number of unique users who visit a website on a given day, and pageviews are the total number of user URL requests for the website. However, multiple requests for the same website on the same day are counted as a single pageview. The website with the highest combination of unique visitors and pageviews is ranked the highest
Colombia: most visited websites 2024, by unique visitors
statista.com
ai-chatbox.pro
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Colombia: most visited websites 2024, by unique visitors [Dataset]. https://www.statista.com/statistics/1409003/most-visited-websites-unique-visitors-colombia/
Explore at:
Dataset updated
Jun 4, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2024
Area covered
Colombia
Description
In November 2024, Google.com was the leading website in Colombia by unique visits, with around 52.9 million single accesses to the URL during that month. YouTube.com came in second with approximately 30.9 million unique monthly visits. Facebook ranked third with 24.2 million unique monthly visits.
O
Top 50 Pages By Pageviews on Austintexas.gov -
data.austintexas.gov
gimi9.com
+1more
application/rdfxml +5
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Austin, Texas - data.austintexas.gov (2023). Top 50 Pages By Pageviews on Austintexas.gov - [Dataset]. https://data.austintexas.gov/City-Government/Top-50-Pages-By-Pageviews-on-Austintexas-gov-/8yfa-b3bq
Explore at:
csv, xml, application/rdfxml, application/rssxml, json, tsvAvailable download formats
Dataset updated
Dec 6, 2023
Dataset authored and provided by
City of Austin, Texas - data.austintexas.gov
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This data, exported from Google Analytics displays the most popular 50 pages on Austintexas.gov based on the following: Views: The total number of times the page was viewed. Repeated views of a single page are counted. Bounce Rate: The percentage of single-page visits (i.e. visits in which the person left your site from the entrance page without interacting with the page).

*Note: On July 1, 2023, standard Universal Analytics properties will stop processing data.
A
‘Popular Website Traffic Over Time ’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-popular-website-traffic-over-time-62e4/62549059/?iid=003-357&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Background

Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.

Methodology

The data collected originates from SimilarWeb.com.

Source

For the analysis and study, go to The Concept Center

This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.

How to use this dataset

Analyze 11/1/2016 in relation to 2/1/2017

Study the influence of 4/1/2017 on 1/1/2017

More datasets

Acknowledgements

If you use this dataset in your research, please credit Chase Willden

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
i
Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...
ieee-dataport.org
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Amar Irsyad Mohd Aminuddin (2024). Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages [Dataset]. https://ieee-dataport.org/documents/website-fingerprinting-dataset-browsing-network-traffic-desktop-and-mobile-webpages
Explore at:
Dataset updated
Oct 21, 2024
Authors
Mohamad Amar Irsyad Mohd Aminuddin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.
Top syndicated pages from CDC.gov by weekly page views
data.virginia.gov
healthdata.gov
+4more
csv, json, rdf, xsl
Updated Aug 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2023). Top syndicated pages from CDC.gov by weekly page views [Dataset]. https://data.virginia.gov/dataset/top-syndicated-pages-from-cdc-gov-by-weekly-page-views
Explore at:
csv, xsl, rdf, jsonAvailable download formats
Dataset updated
Aug 11, 2023
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
The CDC Content Syndication site at https://tools.cdc.gov/syndication/ allows you to import content from CDC websites directly into your own website or application. These services are provided free of charge from CDC. The data shown in this table represent the weekly top page views from CDC.gov offered by syndication.
f
Top 15 websites with highest PageRank.
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peiteng Shi; Xiaohan Huang; Jun Wang; Jiang Zhang; Su Deng; Yahui Wu (2023). Top 15 websites with highest PageRank. [Dataset]. http://doi.org/10.1371/journal.pone.0136243.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0136243.t003
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Peiteng Shi; Xiaohan Huang; Jun Wang; Jiang Zhang; Su Deng; Yahui Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The numbers in the parentheses are the ranking orders according to the focus indicators.Top 15 websites with highest PageRank.
What social Media People like the most and why?
kaggle.com
Updated Feb 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nina Luquez (2023). What social Media People like the most and why? [Dataset]. https://www.kaggle.com/ninaluquez/what-social-media-people-like-the-most-and-why/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nina Luquez
Description
Dataset

This dataset was created by Nina Luquez

Contents
c
Most popular websites in the Netherlands 2015
datacatalogue.cessda.eu
ssh.datastations.nl
Updated Jul 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Kleppe; H. Bijleveld (2023). Most popular websites in the Netherlands 2015 [Dataset]. http://doi.org/10.17026/dans-x6h-6qqt
Explore at:
Unique identifier
https://doi.org/10.17026/dans-x6h-6qqt
Dataset updated
Jul 4, 2023
Dataset provided by
Vrije Universiteit Amsterdam
Authors
M. Kleppe; H. Bijleveld
Area covered
Netherlands
Description
This dataset contains a list of 3654 Dutch websites that we considered the most popular websites in 2015. This list served as whitelist for the Newstracker Research project in which we monitored the online web behaviour of a group of respondents.
The research project 'The Newstracker' was a subproject of the NWO-funded project 'The New News Consumer: A User-Based Innovation Project to Meet Paradigmatic Change in News Use and Media Habits'.
For the Newstracker project we aimed to understand the web behaviour of a group of respondents. We created custom-built software to monitor their web browsing behaviour on their laptops and desktops (please find the code in open access at https://github.com/NITechLabs/NewsTracker). For reasons of scale and privacy we created a whitelist with websites that were the most popular websites in 2015. We manually compiled this list by using data of DDMM, Alexa and own research. The dataset consists of 5 columns:
- the URL
- the type of website: We created a list of types of websites and each website has been manually labeled with 1 category
- Nieuws-regio: When the category was 'News', we subdivided these websites in the regional focus: International, National or Local
- Nieuws-onderwerp: Furthermore, each website under the category News was further subdivided in type of news website. For this we created an own list of news categories and manually coded each website
- Bron: For each website we noted which source we used to find this website.
The full description of the research design of the Newstracker including the set-up of this whitelist is included in the following article: Kleppe, M., Otte, M. (in print), 'Analysing & understanding news consumption patterns by tracking online user behaviour with a multimodal research design', Digital Scholarship in the Humanities, doi 10.1093/llc/fqx030.
h
1k_Website_Screenshots_and_Metadata
huggingface.co
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Silatus (2023). 1k_Website_Screenshots_and_Metadata [Dataset]. https://huggingface.co/datasets/silatus/1k_Website_Screenshots_and_Metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 13, 2023
Dataset authored and provided by
Silatus
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for 1000 Website Screenshots with Metadata

Dataset Summary

Silatus is sharing, for free, a segment of a dataset that we are using to train a generative AI model for text-to-mockup conversions. This dataset was collected in December 2022 and early January 2023, so it contains very recent data from 1,000 of the world's most popular websites. You can get our larger 10,000 website dataset for free at: https://silatus.com/datasets This dataset includes: High-res… See the full description on the dataset page: https://huggingface.co/datasets/silatus/1k_Website_Screenshots_and_Metadata.
O
Open Data BR Site Analytics - Top 10 Assets Viewed or Downloaded
data.brla.gov
application/rdfxml +5
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Open Data BR Site Analytics - Top 10 Assets Viewed or Downloaded [Dataset]. https://data.brla.gov/dataset/Open-Data-BR-Site-Analytics-Top-10-Assets-Viewed-o/ie4p-gccw
Explore at:
tsv, application/rssxml, json, csv, application/rdfxml, xmlAvailable download formats
Dataset updated
Jun 28, 2025
Description
This dataset provides detail on how all assets on a domain are being used (e.g. views, downloads, API reads).
User activity is provided by date, asset uid, asset type, asset name, access type and user segment. Please see Site Analytics: Asset Access for more detail about these fields.
The dataset will reflect new Asset Access records within a day of when they occur.
Top 100 HHS Websites [RAW]
healthdata.gov
data.virginia.gov
application/rdfxml +5
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Top 100 HHS Websites [RAW] [Dataset]. https://healthdata.gov/dataset/Top-100-HHS-Websites-RAW-/xs6e-ics5
Explore at:
tsv, xml, application/rssxml, csv, json, application/rdfxmlAvailable download formats
Dataset updated
Apr 26, 2024
Description
This page serves as the backing dataset for the Top 100 HHS Websites, sorted by total page views. Please refer to the story page here for more information:https://healthdata.gov/stories/s/Top-100-HHS-Websites/d84g-3yzd
w
Dataset of stocks from Top Ships
workwithdata.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of stocks from Top Ships [Dataset]. https://www.workwithdata.com/datasets/stocks?f=1&fcol0=company&fop0=%3D&fval0=Top+Ships
Explore at:
Dataset updated
Apr 11, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about stocks. It has 1 row and is filtered where the company is Top Ships. It features 8 columns including stock name, company, exchange, and exchange symbol.
w
Dataset of stocks from Top Spring International
workwithdata.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of stocks from Top Spring International [Dataset]. https://www.workwithdata.com/datasets/stocks?f=1&fcol0=company&fop0=%3D&fval0=Top+Spring+International
Explore at:
Dataset updated
Apr 11, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about stocks. It has 1 row and is filtered where the company is Top Spring International. It features 8 columns including stock name, company, exchange, and exchange symbol.
S
Website Top Page Views
data.sugarlandtx.gov
csv
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Communications and Community Engagement (2025). Website Top Page Views [Dataset]. https://data.sugarlandtx.gov/dataset/website-top-page-views
Explore at:
csvAvailable download formats
Dataset updated
Jun 7, 2025
Dataset authored and provided by
Communications and Community Engagement
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Instance of an user visiting a particular page on a website.
O
Corporate Website — Analytics — Top 100 search terms
data.qld.gov.au
researchdata.edu.au
html
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brisbane City Council (2025). Corporate Website — Analytics — Top 100 search terms [Dataset]. https://www.data.qld.gov.au/dataset/corporate-website-analytics-top-100-search-terms
Explore at:
htmlAvailable download formats
Dataset updated
Jul 12, 2025
Dataset authored and provided by
Brisbane City Council
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is available on Brisbane City Council’s open data website – data.brisbane.qld.gov.au. The site provides additional features for viewing and interacting with the data and for downloading the data in various formats.

Monthly analytics reports for the Brisbane City Council website

Information regarding the sessions for Brisbane City Council website during the month including search terms used.
A
‘Fortune 1000’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Fortune 1000’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-fortune-1000-03c3/b2a55ac6/?iid=026-666&v=presentation
Explore at:
Dataset updated
Nov 13, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Fortune 1000’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/winston56/fortune-500-data-2021 on 13 November 2021.

--- Dataset description provided by original source is as follows ---

Context

Every year Fortune, an American Business Magazine, publishes the Fortune 500, which ranks the top 500 corporations by revenue. This dataset includes the entire Fortune 1000, as opposed to just the top 500.

Content

The Fortune 1000 dataset is from the Fortune website, collected by the processes outlined in this notebook. It contains U.S. company data for the year 2021. The dataset is 1000 rows and 18 columns.

Features

Company - values are the name of the company

Rank - The 2021 rank established by Fortune (1-1000)

Rank Change - The change in the rank from 2020 to 2021. There is only a rank change listed if the company is currently in the top 500 and was previously in the top 500.

Revenue - Revenue of each company in millions. This is the criteria used to rank each company.

Profit - Profit of each company in millions.

Num. of Employees - The number of employees each company employs.

Sector - The sector of the market the company operates in.

City - The city where the company's headquarters is located.

State - The state where the company's headquarters is located

Newcomer - Indicates whether or not the company is new to the top Fortune 500 ("yes" or "no"). No value will be listed for companies outside of the top 500.

CEO Founder - Indicates whether the CEO of the company is also the founder ("yes" or "no").

CEO Woman - Indicates whether the CEO of the company is a woman ("yes" or "no").

Profitable - Indicates whether the company is profitable or not ("yes" or "no").

Prev. Rank - The 2020 rank of the company, as established by Fortune. There will only be previous rank data for the top 500 companies.

CEO - The name of the CEO of the company

Website - The url of the company website

Ticker - The stock ticker symbol of public companies. Some rows will have empty values because the company is a private corporation.

Market Cap - The market cap (or value) of the company in millions. Some rows will have empty values because the company is private. Market valuations were determined on January 20, 2021.

Inspiration

This dataset is made to explore the top corporations in the U.S. Answer questions such as: What percentage of companies have women ceo's? How many companies are newcomers? What percentage of companies have ceos who were also founders? What role does profitability play in ranking?

--- Original source retains full ownership of the source dataset ---

Facebook

Twitter

Click to copy link

Link copied

Cite

DNS_dataset (2021). Traces captured by visiting the top 1500 website [Dataset]. https://www.kaggle.com/jacksontang16/traces-captured-by-visiting-the-top-1500-website

Traces captured by visiting the top 1500 website

Traffic captured by visiting the top 1500 most visited sites ranked by Alexa

Explore at:

zip(5852806 bytes)Available download formats

Dataset updated

Aug 25, 2021

Authors

DNS_dataset

Description

Dataset

This dataset was created by DNS_dataset

Clear search

Close search

Google apps

Main menu

Traces captured by visiting the top 1500 website

Dataset

Contents

Most visited websites by hierachycal categories

Context

Content

Acknowledgements

Inspiration

(Dataset) The most visited health websites in the world

Alexa Domains Dataset

Colombia: most visited websites 2024, by unique visitors

Top 50 Pages By Pageviews on Austintexas.gov -

‘Popular Website Traffic Over Time ’ analyzed by Analyst-2

About this dataset

Background

Methodology

Source

How to use this dataset

Acknowledgements

Start A New Notebook!

Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...

Top syndicated pages from CDC.gov by weekly page views

Top 15 websites with highest PageRank.

What social Media People like the most and why?

Dataset

Contents

Most popular websites in the Netherlands 2015

1k_Website_Screenshots_and_Metadata

Open Data BR Site Analytics - Top 10 Assets Viewed or Downloaded

Top 100 HHS Websites [RAW]

Dataset of stocks from Top Ships

Dataset of stocks from Top Spring International

Website Top Page Views

Corporate Website — Analytics — Top 100 search terms

‘Fortune 1000’ analyzed by Analyst-2

Context

Content

Features

Inspiration

Traces captured by visiting the top 1500 website

Traffic captured by visiting the top 1500 most visited sites ranked by Alexa

Dataset

Contents