64 datasets found

Leading websites worldwide 2024, by monthly visits
statista.com
barnesnoapp.net
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
Explore at:
Dataset updated
Mar 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2024
Area covered
Worldwide
Description
In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.
Multilingual Scraper of Privacy Policies and Terms of Service
zenodo.org
bin, zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Bernhard; David Bernhard; Luka Nenadic; Luka Nenadic; Stefan Bechtold; Karel Kubicek; Karel Kubicek; Stefan Bechtold (2025). Multilingual Scraper of Privacy Policies and Terms of Service [Dataset]. http://doi.org/10.5281/zenodo.14562039
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14562039
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Bernhard; David Bernhard; Luka Nenadic; Luka Nenadic; Stefan Bechtold; Karel Kubicek; Karel Kubicek; Stefan Bechtold
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Multilingual Scraper of Privacy Policies and Terms of Service: Scraped Documents of 2024

This dataset supplements publication "Multilingual Scraper of Privacy Policies and Terms of Service" at ACM CSLAW’25, March 25–27, 2025, München, Germany. It includes the first 12 months of scraped policies and terms from about 800k websites, see concrete numbers below.

The following table lists the amount of websites visited per month:

Month Number of websites
2024-01 551'148
2024-02 792'921
2024-03 844'537
2024-04 802'169
2024-05 805'878
2024-06 809'518
2024-07 811'418
2024-08 813'534
2024-09 814'321
2024-10 817'586
2024-11 828'662
2024-12 827'101

The amount of websites visited should always be higher than the number of jobs (Table 1 of the paper) as a website may redirect, resulting in two websites scraped or it has to be retried.

To simplify the access, we release the data in large CSVs. Namely, there is one file for policies and another for terms per month. All of these files contain all metadata that are usable for the analysis. If your favourite CSV parser reports the same numbers as above then our dataset is correctly parsed. We use ‘,’ as a separator, the first row is the heading and strings are in quotes.

Since our scraper sometimes collects other documents than policies and terms (for how often this happens, see the evaluation in Sec. 4 of the publication) that might contain personal data such as addresses of authors of websites that they maintain only for a selected audience. We therefore decided to reduce the risks for websites by anonymizing the data using Presidio. Presidio substitutes personal data with tokens. If your personal data has not been effectively anonymized from the database and you wish for it to be deleted, please contact us.

Preliminaries

The uncompressed dataset is about 125 GB in size, so you will need sufficient storage. This also means that you likely cannot process all the data at once in your memory, so we split the data in months and in files for policies and terms.

Files and structure

The files have the following names:

2024_policy.csv for policies

2024_terms.csv for terms

Shared metadata

Both files contain the following metadata columns:

website_month_id - identification of crawled website

job_id - one website can have multiple jobs in case of redirects (but most commonly has only one)

website_index_status - network state of loading the index page. This is resolved by the Chromed DevTools Protocol.

DNS_ERROR - domain cannot be resolved

OK - all fine

REDIRECT - domain redirect to somewhere else

TIMEOUT - the request timed out

BAD_CONTENT_TYPE - 415 Unsupported Media Type

HTTP_ERROR - 404 error

TCP_ERROR - error in the network connection

UNKNOWN_ERROR - unknown error

website_lang - language of index page detected based on langdetect library

website_url - the URL of the website sampled from the CrUX list (may contain subdomains, etc). Use this as a unique identifier for connecting data between months.

job_domain_status - indicates the status of loading the index page. Can be:

OK - all works well (at the moment, should be all entries)

BLACKLISTED - URL is on our list of blocked URLs

UNSAFE - website is not safe according to save browsing API by Google

LOCATION_BLOCKED - country is in the list of blocked countries

job_started_at - when the visit of the website was started

job_ended_at - when the visit of the website was ended

job_crux_popularity - JSON with all popularity ranks of the website this month

job_index_redirect - when we detect that the domain redirects us, we stop the crawl and create a new job with the target URL. This saves time if many websites redirect to one target, as it will be crawled only once. The index_redirect is then the job.id corresponding to the redirect target.

job_num_starts - amount of crawlers that started this job (counts restarts in case of unsuccessful crawl, max is 3)

job_from_static - whether this job was included in the static selection (see Sec. 3.3 of the paper)

job_from_dynamic - whether this job was included in the dynamic selection (see Sec. 3.3 of the paper) - this is not exclusive with from_static - both can be true when the lists overlap.

job_crawl_name - our name of the crawl, contains year and month (e.g., 'regular-2024-12' for regular crawls, in Dec 2024)

Policy data

policy_url_id - ID of the URL this policy has

policy_keyword_score - score (higher is better) according to the crawler's keywords list that given document is a policy

policy_ml_probability - probability assigned by the BERT model that given document is a policy

policy_consideration_basis - on which basis we decided that this url is policy. The following three options are executed by the crawler in this order:

'keyword matching' - this policy was found using the crawler navigation (which is based on keywords)

'search' - this policy was found using search engine

'path guessing' - this policy was found by using well-known URLs like example.com/policy

policy_url - full URL to the policy

policy_content_hash - used as identifier - if the document remained the same between crawls, it won't create a new entry

policy_content - contains the text of policies and terms extracted to Markdown using Mozilla's readability library

policy_lang - Language detected by fasttext of the content

Terms data

Analogous to policy data, just substitute policy to terms.

Updates

Check this Google Docs for an updated version of this README.md.
Number of page views per web session 2022, by vertical & device
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of page views per web session 2022, by vertical & device [Dataset]. https://www.statista.com/statistics/1106552/number-of-visits-website-before-checkout/
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Websites in the energy, utilities, and construction sector averaged the largest amount of visits per online session worldwide. In the fourth quarter of 2022, desktop users in that segment visited around ***** pages per online session. Travel and hospitality ranked second, with an average of almost *** pages visited. In terms of mobile users, travel and hospitality registered the highest number of page views, followed by retail.
B
FishSounds Website Data Repository
borealisdata.ca
search.dataone.org
+1more
Updated Oct 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Audrey Looby; Amalis Riera; Sarah Vela; Kieran Cox; Santiago Bravo; Rodney Rountree; Francis Juanes; Laura K. Reynolds; Charles W. Martin (2024). FishSounds Website Data Repository [Dataset]. http://doi.org/10.5683/SP2/TACOUX
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/TACOUX
Dataset updated
Oct 28, 2024
Dataset provided by
Borealis
Authors
Audrey Looby; Amalis Riera; Sarah Vela; Kieran Cox; Santiago Bravo; Rodney Rountree; Francis Juanes; Laura K. Reynolds; Charles W. Martin
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
underwater ecosystems, global
Description
FishSounds presents a compilation of acoustic recordings and published information on sound production across all extant fish species globally. We hope this information can be used to advance research into fish behavior, passive acoustic monitoring, and human impacts on underwater soundscapes as well as serve as a public resource for anyone interested in learning more about fish sounds. This work is the product of an international collaboration between researchers and developers from five organizations. We have taken a cross-disciplinary approach, combining expertise in fish ecology, bioacoustics, and data management to produce a website that we hope will serve the wider marine research community. This Dataverse dataset serves as a permanent repository for all versions of the FishSounds website and associated publications and products. Please see the latest version for the most detailed methodology and data, though the other versions are available for reference. All of the data provided here may be more easily viewed and searched at FishSounds.net. We will be continuing to update and add to FishSounds.net and this repository, so if you would like to suggest an edit or contribute a reference or associated fish sound recording, please contact us at fishsoundscontact@gmail.com.
Q
Data for: Debating Algorithmic Fairness
data.qdr.syr.edu
Updated Nov 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melissa Hamilton; Melissa Hamilton (2023). Data for: Debating Algorithmic Fairness [Dataset]. http://doi.org/10.5064/F6JOQXNF
Explore at:
pdf(53179), pdf(63339), pdf(285052), pdf(103333), application/x-json-hypothesis(55745), pdf(256399), jpeg(101993), pdf(233414), pdf(536400), pdf(786428), pdf(2243113), pdf(109638), pdf(176988), pdf(59204), pdf(124046), pdf(802960), pdf(82120)Available download formats
Unique identifier
https://doi.org/10.5064/F6JOQXNF
Dataset updated
Nov 13, 2023
Dataset provided by
Qualitative Data Repository
Authors
Melissa Hamilton; Melissa Hamilton
License
https://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions
Time period covered
2008 - 2017
Area covered
United States
Description
This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the Publisher's Website. Data Generation The research project engages a story about perceptions of fairness in criminal justice decisions. The specific focus involves a debate between ProPublica, a news organization, and Northpointe, the owner of a popular risk tool called COMPAS. ProPublica wrote that COMPAS was racist against blacks, while Northpointe posted online a reply rejecting such a finding. These two documents were the obvious foci of the qualitative analysis because of the further media attention they attracted, the confusion their competing conclusions caused readers, and the power both companies wield in public circles. There were no barriers to retrieval as both documents have been publicly available on their corporate websites. This public access was one of the motivators for choosing them as it meant that they were also easily attainable by the general public, thus extending the documents’ reach and impact. Additional materials from ProPublica relating to the main debate were also freely downloadable from its website and a third party, open source platform. Access to secondary source materials comprising additional writings from Northpointe representatives that could assist in understanding Northpointe’s main document, though, was more limited. Because of a claim of trade secrets on its tool and the underlying algorithm, it was more difficult to reach Northpointe’s other reports. Nonetheless, largely because its clients are governmental bodies with transparency and accountability obligations, some of Northpointe-associated reports were retrievable from third parties who had obtained them, largely through Freedom of Information Act queries. Together, the primary and (retrievable) secondary sources allowed for a triangulation of themes, arguments, and conclusions. The quantitative component uses a dataset of over 7,000 individuals with information that was collected and compiled by ProPublica and made available to the public on github. ProPublica’s gathering the data directly from criminal justice officials via Freedom of Information Act requests rendered the dataset in the public domain, and thus no confidentiality issues are present. The dataset was loaded into SPSS v. 25 for data analysis. Data Analysis The qualitative enquiry used critical discourse analysis, which investigates ways in which parties in their communications attempt to create, legitimate, rationalize, and control mutual understandings of important issues. Each of the two main discourse documents was parsed on its own merit. Yet the project was also intertextual in studying how the discourses correspond with each other and to other relevant writings by the same authors. Several more specific types of discursive strategies were of interest in attracting further critical examination: Testing claims and rationalizations that appear to serve the speaker’s self-interest Examining conclusions and determining whether sufficient evidence supported them Revealing contradictions and/or inconsistencies within the same text and intertextually Assessing strategies underlying justifications and rationalizations used to promote a party’s assertions and arguments Noticing strategic deployment of lexical phrasings, syntax, and rhetoric Judging sincerity of voice and the objective consideration of alternative perspectives Of equal importance in a critical discourse analysis is consideration of what is not addressed, that is to uncover facts and/or topics missing from the communication. For this project, this included parsing issues that were either briefly mentioned and then neglected, asserted yet the significance left unstated, or not suggested at all. This task required understanding common practices in the algorithmic data science literature. The paper could have been completed with just the critical discourse analysis. However, because one of the salient findings from it highlighted that the discourses overlooked numerous definitions of algorithmic fairness, the call to fill this gap seemed obvious. Then, the availability of the same dataset used by the parties in conflict, made this opportunity more appealing. Calculating additional algorithmic equity equations would not thereby be troubled by irregularities because of diverse sample sets. New variables were created as relevant to calculate algorithmic fairness equations. In addition to using various SPSS Analyze functions (e.g., regression, crosstabs, means), online statistical calculators were useful to compute z-test comparisons of proportions and t-test comparisons of means. Logic of Annotation Annotations were employed to fulfil a variety of functions, including supplementing the main text with context, observations, counter-points, analysis, and source attributions. These fall under a few categories. Space considerations. Critical discourse analysis offers a rich method...
m
(Dataset) The most visited health websites in the world
data.mendeley.com
narcis.nl
Updated Jan 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patricia Acosta-Vargas (2021). (Dataset) The most visited health websites in the world [Dataset]. http://doi.org/10.17632/n468trh5my.1
Explore at:
Unique identifier
https://doi.org/10.17632/n468trh5my.1
Dataset updated
Jan 11, 2021
Authors
Patricia Acosta-Vargas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
Evaluation of the most visited health websites in the world
Display of recently viewed products on fashion e-commerce sites in Australia...
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Display of recently viewed products on fashion e-commerce sites in Australia 2021 [Dataset]. https://www.statista.com/statistics/1269531/australia-recently-viewed-products-on-e-commerce-sites/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 10, 2021 - Jun 22, 2021
Area covered
Australia
Description
In 2021, a review of the online browsing experience on e-commerce fashion sites showed that ** percent of the websites surveyed did not have a recently viewed products function. overall just ** percent of the websites has recently viewed product data on more than one page.
g
Alexa, International Top 100 Websites, Global, 10.12.2007
geocommons.com
Updated Apr 29, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexa (2008). Alexa, International Top 100 Websites, Global, 10.12.2007 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
Apr 29, 2008
Dataset provided by
data
Alexa
Description
This Dataset shows the Alexa Top 100 International Websites, and provides metrics on the volume of traffic that these sites were able to handle. The Alexa top 100 lists the 100 most visited websites in the world and measures various statistical information. I have looked up the Headquarters, either through alexa, or a Whois Lookup to get street address with i was then able to geocode. I was only able to successfully geocode 85 of the top 100 sites throughout the world. Source of Data was Alexa.com, Source URL: http://www.alexa.com/site/ds/top_sites?ts_mode=global&lang=none Data was from October 12, 2007. Alexa is updated daily so to get more up to date information visit their site directly. they don't have maps though.
C
City Website Analytics
data.ccrpc.org
csv, json, rdf, xml
Updated Aug 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Urbana (2022). City Website Analytics [Dataset]. https://data.ccrpc.org/dataset/city-website-analytics
Explore at:
csv, json, xml, rdfAvailable download formats
Dataset updated
Aug 3, 2022
Dataset provided by
data.urbanaillinois.us
Authors
City of Urbana
Description
Information about pages on the City's website including their age and their Google Analytics data (everything from "PageViews" and to the right). If the Google Analytics fields are empty, the page hasn't been visited recently at all.
w
Visited web pages of the City (by title) in 2015
data.wu.ac.at
opendata.brussels.be
csv, json, xls
Updated Mar 14, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Brussels/Web Unit (2016). Visited web pages of the City (by title) in 2015 [Dataset]. https://data.wu.ac.at/schema/opendata_brussels_be/dmlzaXRlZC13ZWItcGFnZXMtb2YtdGhlLWNpdHktYnktdXJsLWluLTIwMTQ=
Explore at:
json, csv, xlsAvailable download formats
Dataset updated
Mar 14, 2016
Dataset provided by
City of Brussels/Web Unit
Description
Visitor statistics of the pages of the website of the City of Brussels (2015) by title, the number of pages viewed, the number of unique visits. Source: Google Analytics.
e
amazon.com Traffic Analytics Data
analytics.explodingtopics.com
Updated Aug 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). amazon.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/amazon.com
Explore at:
Dataset updated
Aug 1, 2025
Variables measured
Global Rank, Monthly Visits, Authority Score, US Country Rank, Online Services Category Rank
Description
Traffic analytics, rankings, and competitive metrics for amazon.com as of August 2025
Share of global mobile website traffic 2015-2025
statista.com
Updated Sep 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of global mobile website traffic 2015-2025 [Dataset]. https://www.statista.com/statistics/277125/share-of-website-traffic-coming-from-mobile-devices/
Explore at:
Dataset updated
Sep 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In the second quarter of 2025, mobile devices (excluding tablets) accounted for 62.54 percent of global website traffic. Since consistently maintaining a share of around 50 percent beginning in 2017, mobile usage surpassed this threshold in 2020 and has demonstrated steady growth in its dominance of global web access. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.
e
Visitors Statistics MFSR - Number of website pages viewed (daily)
data.europa.eu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministerstvo financií SR, Visitors Statistics MFSR - Number of website pages viewed (daily) [Dataset]. https://data.europa.eu/data/datasets/https-opendata-mfsr-sk-opendata-catalog-statistika-navstevnosti-mfsr-pocet-zobrazenych-stranok-webu-denne?locale=en
Explore at:
Dataset authored and provided by
Ministerstvo financií SR
Description
Visitors Statistics MFSR - Number of website pages viewed (daily)
Data from: Improving the efficacy of web-based educational outreach in...
zenodo.org
data.niaid.nih.gov
+1more
csv, txt
Updated Jun 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory R. Goldsmith; Andrew D. Fulton; Colin D. Witherill; Javier F. Espeleta; Gregory R. Goldsmith; Andrew D. Fulton; Colin D. Witherill; Javier F. Espeleta (2022). Data from: Improving the efficacy of web-based educational outreach in ecology [Dataset]. http://doi.org/10.5061/dryad.94nk8
Explore at:
csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.94nk8
Dataset updated
Jun 1, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gregory R. Goldsmith; Andrew D. Fulton; Colin D. Witherill; Javier F. Espeleta; Gregory R. Goldsmith; Andrew D. Fulton; Colin D. Witherill; Javier F. Espeleta
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Scientists are increasingly engaging the web to provide formal and informal science education opportunities. Despite the prolific growth of web-based resources, systematic evaluation and assessment of their efficacy remains limited. We used clickstream analytics, a widely available method for tracking website visitors and their behavior, to evaluate >60,000 visits over three years to an educational website focused on ecology. Visits originating from search engine queries were a small proportion of the traffic, suggesting the need to actively promote websites to drive visitation. However, the number of visits referred to the website per social media post varied depending on the social media platform and the quality of those visits (e.g., time on site and number of pages viewed) was significantly lower than visits originating from other referring websites. In particular, visitors referred to the website through targeted promotion (e.g., inclusion in a website listing classroom teaching resources) had higher quality visits. Once engaged in the site's core content, visitor retention was high; however, visitors rarely used the tutorial resources that serve to explain the site's use. Our results demonstrate that simple changes in website design, content and promotion are likely to increase the number of visitors and their engagement. While there is a growing emphasis on using the web to broaden the impacts of biological research, time and resources remain limited. Clickstream analytics provides an easily accessible, relatively fast and quantitative means by which those engaging in educational outreach can improve upon their efforts.
March Madness Historical DataSet (2002 to 2025)
kaggle.com
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Pilafas (2025). March Madness Historical DataSet (2002 to 2025) [Dataset]. https://www.kaggle.com/datasets/jonathanpilafas/2024-march-madness-statistical-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 22, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jonathan Pilafas
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard

This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.

Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.

These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.

This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.
O
Website statistics—Community support
data.qld.gov.au
researchdata.edu.au
+1more
csv
Updated Apr 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Communities, Housing and Digital Economy (2021). Website statistics—Community support [Dataset]. https://www.data.qld.gov.au/dataset/website-statistics-community-support
Explore at:
csv(15.5 KiB), csv(29 KiB), csv(10 KiB), csv(21 KiB), csv(16 KiB), csv(26.5 KiB), csv(12 KiB), csv(25.5 KiB), csv(15 KiB), csv(17.5 KiB), csv(11.5 KiB), csv(14 KiB)Available download formats
Dataset updated
Apr 24, 2021
Dataset authored and provided by
Communities, Housing and Digital Economy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Monthly statistics for pages viewed by visitors to the Queensland Government website—Community support franchise. Source: Google Analytics
b
Most Popular Apps (2025)
businessofapps.com
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Business of Apps (2025). Most Popular Apps (2025) [Dataset]. https://www.businessofapps.com/data/most-popular-apps/
Explore at:
Dataset updated
Jul 28, 2025
Dataset authored and provided by
Business of Apps
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
The pendulum swung in 2022 with app downloads stagnating, after two years of solid growth under the pandemic. In 2023, some categories saw growth while others continued to stagnate, as users shifted...
h
ARCHITRAVE [map visualization : data & software]
heidata.uni-heidelberg.de
application/gzip, pdf
Updated Oct 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hendrik Ziegler; Hendrik Ziegler; Alexandra Pioch; Alexandra Pioch (2021). ARCHITRAVE [map visualization : data & software] [Dataset]. http://doi.org/10.11588/DATA/AT1QUR
Explore at:
pdf(241144), application/gzip(914689)Available download formats
Unique identifier
https://doi.org/10.11588/DATA/AT1QUR
Dataset updated
Oct 22, 2021
Dataset provided by
heiDATA
Authors
Hendrik Ziegler; Hendrik Ziegler; Alexandra Pioch; Alexandra Pioch
License
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.11588/DATA/AT1QURhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.11588/DATA/AT1QUR
Time period covered
1685 - 1723
Area covered
Germany, Poland, Paris, France, France, Netherlands, Spain, Versailles, France, Italy, Belgium
Dataset funded by
DFG-ANR
Description
The dataset includes cartographic visualization data and software designed, implemented, and published for the ARCHITRAVE research project website. The research focused on the edition, executed in German and French, of six travelogues by German travelers of the Baroque period who visited Paris and Versailles. The edited texts are published in the Textgrid repository. For all further information on the content and objectives of the research, please refer to the website (https://architrave.eu/) and given literature. Three visualizations were created for the website: the travel stops of five of the travelers on their way to Paris and Versailles the sites in Europe mentioned in the six travelogues the sites in Paris described by the six travelers The visualizations were implemented with Leaflet.js. The dataset contains scripts for data crunching processed geodata scripts for leaflet.js License README
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
a
Arizona State University Twitter Data Set
academictorrents.com
bittorrent
Updated Dec 23, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
R. Zafarani and H. Liu (2013). Arizona State University Twitter Data Set [Dataset]. https://academictorrents.com/details/2399616d26eeb4ae9ac3d05c7fdd98958299efa9
Explore at:
bittorrent(354770146)Available download formats
Dataset updated
Dec 23, 2013
Dataset authored and provided by
R. Zafarani and H. Liu
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and sms messaging all rolled into one neat and simple package. It s a new and easy way to discover the latest news related to subjects you care about. |Attribute|Value| |-|-| |Number of Nodes: |11316811| |Number of Edges: |85331846| |Missing Values? |no| |Source:| N/A| ##Data Set Information: 1. nodes.csv — it s the file of all the users. This file works as a dictionary of all the users in this data set. It s useful for fast reference. It contains all the node ids used in the dataset 2. edges.csv — this is the friendship/followership network among the users. The friends/followers are represented using edges. Edges are directed. Here is an example. 1,2 This means user with id "1" is followering user with id "2". ##Attribute Information: Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and sms messaging all rolled into one ne

Month	Number of websites
2024-01	551'148
2024-02	792'921
2024-03	844'537
2024-04	802'169
2024-05	805'878
2024-06	809'518
2024-07	811'418
2024-08	813'534
2024-09	814'321
2024-10	817'586
2024-11	828'662
2024-12	827'101

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/

Leading websites worldwide 2024, by monthly visits

Explore at:

96 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Mar 24, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Nov 2024

Area covered

Worldwide

Description

In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.

Clear search

Close search

Google apps

Main menu

Leading websites worldwide 2024, by monthly visits

Multilingual Scraper of Privacy Policies and Terms of Service

Multilingual Scraper of Privacy Policies and Terms of Service: Scraped Documents of 2024

Preliminaries

Files and structure

Shared metadata

Policy data

Terms data

Updates

Number of page views per web session 2022, by vertical & device

FishSounds Website Data Repository

Data for: Debating Algorithmic Fairness

(Dataset) The most visited health websites in the world

Display of recently viewed products on fashion e-commerce sites in Australia...

Alexa, International Top 100 Websites, Global, 10.12.2007

City Website Analytics

Visited web pages of the City (by title) in 2015

amazon.com Traffic Analytics Data

Share of global mobile website traffic 2015-2025

Visitors Statistics MFSR - Number of website pages viewed (daily)

Data from: Improving the efficacy of web-based educational outreach in...

March Madness Historical DataSet (2002 to 2025)

Website statistics—Community support

Most Popular Apps (2025)

ARCHITRAVE [map visualization : data & software]

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Arizona State University Twitter Data Set

Leading websites worldwide 2024, by monthly visits