23 datasets found

D
Monthly Page Views to CDC.gov
data.cdc.gov
data.virginia.gov
+4more
application/rdfxml +5
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the Associate Director for Communication, Division of News and Electronic Media (2025). Monthly Page Views to CDC.gov [Dataset]. https://data.cdc.gov/Web-Metrics/Monthly-Page-Views-to-CDC-gov/rq85-buyi
Explore at:
xml, application/rdfxml, json, csv, application/rssxml, tsvAvailable download formats
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Office of the Associate Director for Communication, Division of News and Electronic Media
Description
For more information on CDC.gov metrics please see http://www.cdc.gov/metrics/
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
g
Website Metrics
gimi9.com
datasets.ai
+1more
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Website Metrics [Dataset]. https://gimi9.com/dataset/data-gov_website-metrics/
Explore at:
Dataset updated
Apr 1, 2025
Description
Per the Federal Digital Government Strategy, the Department of Homeland Security Metrics Plan, and the Open FEMA Initiative, FEMA is providing the following web performance metrics with regards to FEMA.gov.rnrnInformation in this dataset includes total visits, avg visit duration, pageviews, unique visitors, avg pages/visit, avg time/page, bounce ratevisits by source, visits by Social Media Platform, and metrics on new vs returning visitors.rnrnExternal Affairs strives to make all communications accessible. If you have any challenges accessing this information, please contact FEMAWebTeam@fema.dhs.gov.
Data from: Google Analytics & Twitter dataset from a movies, TV series and...
figshare.com
portalcientificovalencia.univeuropea.com
txt
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Víctor Yeste (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. http://doi.org/10.6084/m9.figshare.16553061.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16553061.v4
Dataset updated
Feb 7, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Víctor Yeste
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
b
Corporate Website — Analytics — Popular pages
data.brisbane.qld.gov.au
prod-brisbane-queensland.opendatasoft.com
csv, excel, json
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Corporate Website — Analytics — Popular pages [Dataset]. https://data.brisbane.qld.gov.au/explore/dataset/corporate-website-analytics-popular-pages/
Explore at:
json, excel, csvAvailable download formats
Dataset updated
Apr 17, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Monthly analytics reports for the Brisbane City Council website

Information regarding the sessions for Brisbane City Council website during the month including page views and unique page views.
Google Trends and Wikipedia Page Views
zenodo.org
explore.openaire.eu
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mitsuo Yoshida; Mitsuo Yoshida (2020). Google Trends and Wikipedia Page Views [Dataset]. http://doi.org/10.5281/zenodo.14539
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14539
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mitsuo Yoshida; Mitsuo Yoshida
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Abstract (our paper)

The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends.

Data

personal-name.txt.gz:
The first column is the Wikipedia article id, the second column is the search keyword, the third column is the Wikipedia article title, and the fourth column is the total of page views from 2008 to 2014.

personal-name_data_google-trends.txt.gz, personal-name_data_wikipedia.txt.gz:
The first column is the period to be collected, the second column is the source (Google or Wikipedia), the third column is the Wikipedia article id, the fourth column is the search keyword, the fifth column is the date, and the sixth column is the value of search trend or page view.

Publication

This data set was created for our study. If you make use of this data set, please cite:
Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015.
http://dx.doi.org/10.1145/2786451.2786495
http://arxiv.org/abs/1509.02218 (author-created version)

Note

The raw data of Wikipedia page views is available in the following page.
http://dumps.wikimedia.org/other/pagecounts-raw/
Data from: Analysis of the Quantitative Impact of Social Networks General...
figshare.com
produccioncientifica.ucm.es
doc
Updated Oct 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Parra; Santiago Martínez Arias; Sergio Mena Muñoz (2022). Analysis of the Quantitative Impact of Social Networks General Data.doc [Dataset]. http://doi.org/10.6084/m9.figshare.21329421.v1
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21329421.v1
Dataset updated
Oct 14, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
David Parra; Santiago Martínez Arias; Sergio Mena Muñoz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union". Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content? To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic. In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained. To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market. It includes:

Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures
Wikipedia Web Traffic 2018-19
kaggle.com
Updated Apr 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
san_bt (2021). Wikipedia Web Traffic 2018-19 [Dataset]. https://www.kaggle.com/datasets/sandeshbhat/wikipedia-web-traffic-201819/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2021
Dataset provided by
Kaggle
Authors
san_bt
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Time Series: Time series is a set of observations recorded over regular interval of time, Time series can be beneficial in many fields like stock market prediction, weather forecasting. - Accounts for the fact that data points taken over time may have an internal structure (such as auto correlation, trend or seasonal variation) that should be accounted for.

Web traffic: Amount of data sent and received by visitors to a website. - Sites monitor the incoming and outgoing traffic to see which parts or pages of their site are popular and if there are any apparent trends, such as one specific page being viewed mostly by people in a particular country

Content

Contains Page Views for 60k Wikipedia articles in 8 different languages taken on a daily basis for 2 years.

https://i.ibb.co/h1JCgpY/DSLC.png" alt="DSLC">

A Data Science Life Cycle can be used to create a project. Forecasting can be done for any interval provided sufficient dataset is available. Refer the Github link in the tasks to view the forecast done using ARIMA and Prophet. Further feel free to contribute. Several other models can be used including a neural network to improve the results by many folds.

Acknowledgements

Credits :
1. Wikipedia 2. Google
P
Alexa Domains Dataset
paperswithcode.com
opendatalab.com
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isaac Corley; Jonathan Lwowski; Justin Hoffman (2001). Alexa Domains Dataset [Dataset]. https://paperswithcode.com/dataset/gagan-bhatia
Explore at:
Dataset updated
Feb 1, 2001
Authors
Isaac Corley; Jonathan Lwowski; Justin Hoffman
Description
This dataset is composed of the URLs of the top 1 million websites. The domains are ranked using the Alexa traffic ranking which is determined using a combination of the browsing behavior of users on the website, the number of unique visitors, and the number of pageviews. In more detail, unique visitors are the number of unique users who visit a website on a given day, and pageviews are the total number of user URL requests for the website. However, multiple requests for the same website on the same day are counted as a single pageview. The website with the highest combination of unique visitors and pageviews is ranked the highest
C
Monthly web access to the Open Data portal
ckan.mobidatalab.eu
csv, json
Updated Oct 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unità Open Data (2023). Monthly web access to the Open Data portal [Dataset]. https://ckan.mobidatalab.eu/ru/dataset/ds1475_monthly-web-accesses-to-the-open-data-portal
Explore at:
csv(4846), json(14505)Available download formats
Dataset updated
Oct 9, 2023
Dataset provided by
Unità Open Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains information, divided by month, on accesses made to the online services offered by the opendata portal and provided by the municipality of Milan. The pageviews column represents the total number of web pages that have been viewed within the time frame used. The visits column represents the total visits made, within the time frame used. The visitors column represents the total number of unique visitors who have accessed the web pages. By unique visitor, we mean a visitor counted only once within the time frame used.
C
Monthly web access to the Institutional Portal
ckan.mobidatalab.eu
csv, json
Updated Oct 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unità Open Data (2023). Monthly web access to the Institutional Portal [Dataset]. https://ckan.mobidatalab.eu/no/dataset/ds1477_monthly-web-accesses-to-the-institutional-portal
Explore at:
csv(5744), json(15403)Available download formats
Dataset updated
Oct 9, 2023
Dataset provided by
Unità Open Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains information, divided by month, on accesses made to the online services offered by the institutional portal and provided by the municipality of Milan. The pageviews column represents the total number of web pages that have been viewed within the time frame used. The visits column represents the total visits made, within the time frame used. The visitors column represents the total number of unique visitors who have accessed the web pages. By unique visitor, we mean a visitor counted only once within the time frame used.
g
Daily web access to the Open Data portal | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daily web access to the Open Data portal | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_ds1474/
Explore at:
Description
The dataset contains information, divided by day, on the accesses made to the online services offered by the opendata portal and provided by the municipality of Milan. The pageviews column represents the total number of web pages, which have been displayed within the time frame used. The visits column represents the total number of visits made, within the time frame used. The visitors column represents the total number of unique visitors who have accessed the web pages. By unique visitor, we mean a visitor counted only once within the time frame used.
Lead Scoring Dataset
kaggle.com
zip
Updated Aug 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amrita Chatterjee (2020). Lead Scoring Dataset [Dataset]. https://www.kaggle.com/amritachatterjee09/lead-scoring-dataset
Explore at:
zip(411028 bytes)Available download formats
Dataset updated
Aug 17, 2020
Authors
Amrita Chatterjee
Description
Context

An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses.

The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number, they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead conversion rate at X education is around 30%.

Now, although X Education gets a lot of leads, its lead conversion rate is very poor. For example, if, say, they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.

There are a lot of leads generated in the initial stage (top) but only a few of them come out as paying customers from the bottom. In the middle stage, you need to nurture the potential leads well (i.e. educating the leads about the product, constantly communicating, etc. ) in order to get a higher lead conversion.

X Education wants to select the most promising leads, i.e. the leads that are most likely to convert into paying customers. The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher lead score h have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%.

Content

Variables Description * Prospect ID - A unique ID with which the customer is identified. * Lead Number - A lead number assigned to each lead procured. * Lead Origin - The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc. * Lead Source - The source of the lead. Includes Google, Organic Search, Olark Chat, etc. * Do Not Email -An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not. * Do Not Call - An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not. * Converted - The target variable. Indicates whether a lead has been successfully converted or not. * TotalVisits - The total number of visits made by the customer on the website. * Total Time Spent on Website - The total time spent by the customer on the website. * Page Views Per Visit - Average number of pages on the website viewed during the visits. * Last Activity - Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc. * Country - The country of the customer. * Specialization - The industry domain in which the customer worked before. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form. * How did you hear about X Education - The source from which the customer heard about X Education. * What is your current occupation - Indicates whether the customer is a student, umemployed or employed. * What matters most to you in choosing this course An option selected by the customer - indicating what is their main motto behind doing this course. * Search - Indicating whether the customer had seen the ad in any of the listed items. * Magazine
* Newspaper Article * X Education Forums
* Newspaper * Digital Advertisement * Through Recommendations - Indicates whether the customer came in through recommendations. * Receive More Updates About Our Courses - Indicates whether the customer chose to receive more updates about the courses. * Tags - Tags assigned to customers indicating the current status of the lead. * Lead Quality - Indicates the quality of lead based on the data and intuition the employee who has been assigned to the lead. * Update me on Supply Chain Content - Indicates whether the customer wants updates on the Supply Chain Content. * Get updates on DM Content - Indicates whether the customer wants updates on the DM Content. * Lead Profile - A lead level assigned to each customer based on their profile. * City - The city of the customer. * Asymmetric Activity Index - An index and score assigned to each customer based on their activity and their profile * Asymmetric Profile Index * Asymmetric Activity Score * Asymmetric Profile Score
* I agree to pay the amount through cheque - Indicates whether the customer has agreed to pay the amount through cheque or not. * a free copy of Mastering The Interview - Indicates whether the customer wants a free copy of 'Mastering the Interview' or not. * Last Notable Activity - The last notable activity performed by the student.

Acknowledgements

UpGrad Case Study

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
u
alis.alberta.ca - Web Traffic Statistics - Catalogue - Canadian Urban Data...
beta.data.urbandatacentre.ca
data.urbandatacentre.ca
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). alis.alberta.ca - Web Traffic Statistics - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://beta.data.urbandatacentre.ca/dataset/ab-alis-alberta-ca-web-traffic
Explore at:
Dataset updated
Jun 10, 2025
Area covered
Alberta
Description
Through its Employment and Financial Services (EFS) division, Seniors, Community and Social Services’ (SCSS) programs form a strong foundation of support to help many Albertans find and keep jobs. The ministry provides financial support, employment services, career resources, referrals, information on job fairs and workshops, and local labor market information. The goal is to help individuals and families gain independence by providing opportunities to enhance their skills to get jobs. The alis.alberta.ca website provides employment resources to help Albertans enhance their employability, plan for education and training, make informed career choices, and connect to and be successful in the labour market. This dataset provides information on web traffic statistics for the alis website, including information on pageviews and web sessions, demographic information for web sessions, and traffic information for the alis YouTube channel (https://www.youtube.com/user/ALISwebsite).
C
Daily web access to the citizen's file
ckan.mobidatalab.eu
csv, json
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unità Open Data (2023). Daily web access to the citizen's file [Dataset]. https://ckan.mobidatalab.eu/dataset/ds1472_daily-web-accesses-to-the-citizen-file
Explore at:
csv(113352), json(319831)Available download formats
Dataset updated
Oct 9, 2023
Dataset provided by
Unità Open Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains information, divided by day, on accesses made to online services offered by the citizen's file and provided by the municipality of Milan. The pageviews column represents the total number of web pages that have been viewed within the time frame used. The visitors column represents the total number of unique visitors who have accessed the web pages. By unique visitor, we mean a visitor counted only once within the time frame used.

Data from: WikiReddit: Tracing Information and Attention Flows Between...

zenodo.org

bin

Updated May 4, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms [Dataset]. http://doi.org/10.5281/zenodo.14653265

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14653265

Dataset updated

May 4, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jan 15, 2025

Description

Preprint

Gildersleve, P., Beers, A., Ito, V., Orozco, A., & Tripodi, F. (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms. arXiv [Cs.CY]. https://doi.org/10.48550/arXiv.2502.04942

Accepted at the International AAAI Conference on Web and Social Media (ICWSM) 2025

Abstract

The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.

Datasheet

Motivation

The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.

Composition

WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.

Collection Process

Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.

Preprocessing/cleaning/labeling

Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.

Uses

We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.

Distribution

The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942

Maintenance

Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.

SQL Database Schema

Table: `posts`

Column Name	Type	Description
`subreddit_id`	TEXT	The unique identifier for the subreddit.
`crosspost_parent_id`	TEXT	The ID of the original Reddit post if this post is a crosspost.
`post_id`	TEXT	Unique identifier for the Reddit post.
`created_at`	TIMESTAMP	The timestamp when the post was created.
`updated_at`	TIMESTAMP	The timestamp when the post was last updated.
`language_code`	TEXT	The language code of the post.
`score`	INTEGER	The score (upvotes minus downvotes) of the post.
`upvote_ratio`	REAL	The ratio of upvotes to total votes.
`gildings`	INTEGER	Number of awards (gildings) received by the post.
`num_comments`	INTEGER	Number of comments on the post.

Table: `comments`

Column Name	Type	Description
`subreddit_id`	TEXT	The unique identifier for the subreddit.
`post_id`	TEXT	The ID of the Reddit post the comment belongs to.
`parent_id`	TEXT	The ID of the parent comment (if a reply).
`comment_id`	TEXT	Unique identifier for the comment.
`created_at`	TIMESTAMP	The timestamp when the comment was created.
`last_modified_at`	TIMESTAMP	The timestamp when the comment was last modified.
`score`	INTEGER	The score (upvotes minus downvotes) of the comment.
`upvote_ratio`	REAL	The ratio of upvotes to total votes for the comment.
`gilded`	INTEGER	Number of awards (gildings) received by the comment.

Table: `postlinks`

Column Name	Type	Description
`post_id`	TEXT	Unique identifier for the Reddit post.
`end_processed_valid`	INTEGER	Whether the extracted URL from the post resolves to a valid URL.
`end_processed_url`	TEXT	The extracted URL from the Reddit post.
`final_valid`	INTEGER	Whether the final URL from the post resolves to a valid URL after redirections.
`final_status`	INTEGER	HTTP status code of the final URL.
`final_url`	TEXT	The final URL after redirections.
`redirected`	INTEGER	Indicator of whether the posted URL was redirected (1) or not (0).
`in_title`	INTEGER	Indicator of whether the link appears in the post title (1) or post body (0).

Table: `commentlinks`

Column Name	Type	Description
`comment_id`	TEXT	Unique identifier for the Reddit comment.
`end_processed_valid`	INTEGER	Whether the extracted URL from the comment resolves to a valid URL.
`end_processed_url`	TEXT	The extracted URL from the comment.
`final_valid`	INTEGER	Whether the final URL from the comment resolves to a valid URL after redirections.
`final_status`	INTEGER	HTTP status code of the final

ASIC - Registered Auditor Dataset
demo.dev.magda.io
cloud.csiss.gmu.edu
+3more
csv, pdf, xlsx
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Securities and Investments Commission (ASIC) (2025). ASIC - Registered Auditor Dataset [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-4117fdec-3eff-44a1-9b78-869f7ec5c409
Explore at:
pdf, csv, xlsxAvailable download formats
Dataset updated
Jun 22, 2025
Dataset provided by
Australian Securities & Investments Commissionhttp://asic.gov.au/
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Update March 2025### From 21 March 2025, the dataset update frequency has change from monthly to weekly every Friday. ###Update April 2022### We have replaced the .xlsx file resources for all our …Show full description###Update March 2025### From 21 March 2025, the dataset update frequency has change from monthly to weekly every Friday. ###Update April 2022### We have replaced the .xlsx file resources for all our datasets. This was required due to the API and web page search functionality no longer being supported for .xlsx files on the Data.Gov platform. ###Dataset summary### ASIC is Australia’s corporate, markets and financial services regulator. ASIC contributes to Australia’s economic reputation and well being by ensuring that Australia's financial markets are fair and transparent, and supported by confident and informed investors and consumers. Registered Auditors are required to maintain their details on ASIC's registers. Information contained on the Registered Auditor Register is made available to the public to search via the ASIC Connect website. Selected data from the register will be uploaded each week to www.data.gov.au. The data made available will be a snapshot of the register at a point in time. Legislation prescribes the type of information ASIC is allowed to disclose to the public. The information in the downloadable dataset includes: Register Name Registered Auditor number Registered Auditor name Registered Auditor ACN (if applicable) Register Auditor start date Registered Auditor status Registered Auditor suspension date Principal business town Principal business state/territory Principal business postcode Principal business address Country Additional information about Registered Auditors can be found via ASIC's website. To view some information you may be charged a fee. More information about searching ASIC's registers.
g
Daily web access to the citizen's file
gimi9.com
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Daily web access to the citizen's file [Dataset]. https://gimi9.com/dataset/eu_ds1472/
Explore at:
Dataset updated
Apr 18, 2024
Description
The dataset contains information, divided by day, on the accesses made to the online services offered by the citizen's file and provided by the municipality of Milan. The pageviews column represents the total number of web pages, which have been displayed within the time frame used. The visitors column represents the total number of unique visitors who have accessed the web pages. By unique visitor, we mean a visitor counted only once within the time frame used.
Data from: Detecting Degradation of Web Browsing Quality of Experience
figshare.com
txt
Updated Nov 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexis Huet; Zied Ben Houidi; Bertrand Mathieu; Dario Rossi (2020). Detecting Degradation of Web Browsing Quality of Experience [Dataset]. http://doi.org/10.6084/m9.figshare.13089854.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13089854.v1
Dataset updated
Nov 2, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Alexis Huet; Zied Ben Houidi; Bertrand Mathieu; Dario Rossi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset represents 222k samples of web browsing session measurements collected during 2.5 months using the Web View platform (https://webview.orange.com) [1]. Web View allows different probes to automatically execute multiple web sessions in a real end-user environment. In our test campaign, we use 17 machines, spread in three different locations worldwide (Lannion, Paris and Mauritius islands), different ISPs and access technologies (ADSL, WiFi and fiber) for a total of 9 combinations, and up to 12 browser versions, which include various versions of Chrome and Firefox. Each machine can request a different browser viewport, can enable or disable the AdBlock plugin to emulate different user preferences and can request a specific network protocol (HTTP/1, HTTP/2 or QUIC).We leverage this dataset to phrase the QoE degradation detection issue as a change point detection problem in [2]. Our results, beyond showing feasibility, warn about the exclusive use of QoE indicators that are very close to content, as changes in the content space can lead to false alarms that are not tied to network-related problems.If you use these datasets in your research, you can reference the appropriate papers:[1] A. Saverimoutou, B. Mathieu, and S. Vaton, “Web View: A measurement platform for depicting web browsing performance and delivery,” IEEE Communications Magazine, vol. 58, no. 3, pp. 33–39, 2020.[2] A. Huet, Z. Ben Houidi, B. Mathieu, D. Rossi “Detecting degradation of web browsing quality of experience,” 16th International Conference on Network and Service Management (CNSM), 2020.Each row represents one experiment, and the columns are as follows:- wwwName: Target page- timestamp: Timestamp with format YYYY-MM-DD hh:mm:ss- browserUsed: Internet browser and version - requestedProtocol: Requested L7 protocol- adBlocker: Whether adBlocker is used or not- networkIface: Network interface- winSize: Window size- visiblePortion: Visible portion of the page that is above the fold in percents- h1Share: Share of the traffic coming from HTTP/1 in percents- h2Share: Share of the traffic coming from HTTP/2 in percents- hqShare: Share of the traffic coming from QUIC in percents- pushShare: Share of the traffic coming from HTTP/2 Server Push in percents- nbRes: Number of objects of the page- nbResNA: Number of objects coming from North America- nbResSA: Number of objects coming from South America- nbResEU: Number of objects coming from Europe- nbResAS: Number of objects coming from Asia- nbResAF: Number of objects coming from Africa- nbResOC: Number of objects coming from Oceania- nbResUKN: Number of objects coming from unknown provenance- nbHTTPS: Number of objects coming from an HTTPS connection- nbHTTP: Number of objects coming from an HTTP connection- nbDomNA: Number of different domain names coming from North America- nbDomSA: Number of different domain names coming from South America- nbDomEU: Number of different domain names coming from Europe- nbDomAS: Number of different domain names coming from Asia- nbDomAF: Number of different domain names coming from Africa- nbDomOC: Number of different domain names coming from Oceania- firstPaint: First paint time (ms)- tfvr: Time for Full Visual Rendering (ms)- dom: DOM time (ms)- plt: Page Load Time (ms)- machine: Machine name (containing location information)- categoryType: Category of the web page- pageSize: Total web page size (bytes)- receiveTime: Total receive time from HAR (ms)- transferRate: Transfer rate (bps)- id: Unique identification of the current experiment- config: Identification for the tuple (browserUsed, requestedProtocol, adBlocker, networkIface, winSize, machine, wwwName), i.e. the probe configuration with target wwwName
Data from: Revealing QoE of Web Users from Encrypted Network Traffic
figshare.com
zip
Updated Jun 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexis Huet; Antoine Saverimoutou; Zied Ben Houidi; Hao Shi; Shengming Cai; Jinchun Xu; Bertrand Mathieu; Dario Rossi (2020). Revealing QoE of Web Users from Encrypted Network Traffic [Dataset]. http://doi.org/10.6084/m9.figshare.12459293.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12459293.v1
Dataset updated
Jun 16, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Alexis Huet; Antoine Saverimoutou; Zied Ben Houidi; Hao Shi; Shengming Cai; Jinchun Xu; Bertrand Mathieu; Dario Rossi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We present a dataset targeting a large set of popular pages (Alexa top-500), from probes from several ISPs networks, browsers software (Chrome, Firefox) and viewport combinations, for over 200,000 experiments realized in 2019.We purposely collect two distinct sets with two different tools, namely Web Page Test (WPT) and Web View (WV), varying a number of relevant parameters and conditions, for a total of 200K+ web sessions, roughly equally split among WV and WPT. Our dataset comprises variations in terms of geographical coverage, scale, diversity and representativeness (location, targets, protocol, browser, viewports, metrics).For Web Page Test, we used the online service www.webpagetest.org at different locations worldwide (Europe, Asia, USA) and private WPT instances in three locations in China (Beijing, Shanghai, Dongguan). The list of target URLs comprised the main pages and five random subpages from Alexa top-500 worldwide and China. We varied network conditions : native connections and 4G, FIOS, 3GFast, DSL, and custom shaping/loss conditions. The other elements in the configuration were fixed: Chrome browser on desktop with a fixed screen resolution, HTTP/2 protocol and IPv4.For Web View, we collected experiments from three machines located in France. We selected two versions of two browser families (Chrome 75/77, Firefox 63/68), two screen sizes (1920x1080, 1440x900), and employ different browser configurations (one half of the experiments activate the AdBlock plugin) from two different access technologies (fiber and ADSL). From a protocol standpoint, we used both IPv4 and IPv6, with HTTP/2 and QUIC, and performed repeated experiments with cached objects/DNS. Given the settings diversity, we restricted the number of websites to about 50 among the Alexa top-500 websites, to ensure statistical relevance of the collected samples for each page.The two archives IFIPNetworking2020_WebViewOrange.zip and IFIPNetworking2020_Webpagetest.zip correspond respectively to the Web View experiments and to the Web Page Test experiments.Each archive contains three files:- config.csv: Description of parameters and conditions for each run,- metrics.csv: Value of different metrics collected by the browser,- progressionCurves.csv: Progression curves of the bytes progress as seen by the network, from 0 to 10 seconds by steps of 100 milliseconds,- listUrl folder: Indexes the sets of urls.Regarding config.csv, the columns are: - index: Index for this set of conditions, - location: Location of the machine, - listUrl: List of urls, located in the folder listUrl - browserUsed: Internet browser and version - terminal: Desktop or Mobile - collectionEnvironment: Identification of the collection environment - networkConditionsTrafficShaping (WPT only): Whether native condition or traffic shaping (4G, FIOS, 3GFast, DSL, or custom Emulator conditions) - networkConditionsBandwidth (WPT only): Bandwidth of the network - networkConditionsDelay (WPT only): Delay in the network - networkConditions (WV only): network conditions - ipMode (WV only): requested L3 protocol, - requestedProtocol (WV only): requested L7 protocol - adBlocker (WV only): Whether adBlocker is used or not - winSize (WV only): Window sizeRegarding metrics.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - DOM Content Loaded Event End (ms): DOM time, - First Paint (ms) (WV only): First paint time, - Load Event End (ms): Page Load Time from W3C, - RUM Speed Index (ms) (WV only): RUM Speed Index, - Speed Index (ms) (WPT only): Speed Index, - Time for Full Visual Rendering (ms) (WV only): Time for Full Visual Rendering - Visible portion (%) (WV only): Visible portion, - Time to First Byte (ms) (WPT only): Time to First Byte, - Visually Complete (ms) (WPT only): Visually Complete used to compute the Speed Index, - aatf: aatf using ATF-chrome-plugin - bi_aatf: bi_aatf using ATF-chrome-plugin - bi_plt: bi_plt using ATF-chrome-plugin - dom: dom using ATF-chrome-plugin - ii_aatf: ii_aatf using ATF-chrome-plugin - ii_plt: ii_plt using ATF-chrome-plugin - last_css: last_css using ATF-chrome-plugin - last_img: last_img using ATF-chrome-plugin - last_js: last_js using ATF-chrome-plugin - nb_ress_css: nb_ress_css using ATF-chrome-plugin - nb_ress_img: nb_ress_img using ATF-chrome-plugin - nb_ress_js: nb_ress_js using ATF-chrome-plugin - num_origins: num_origins using ATF-chrome-plugin - num_ressources: num_ressources using ATF-chrome-plugin - oi_aatf: oi_aatf using ATF-chrome-plugin - oi_plt: oi_plt using ATF-chrome-plugin - plt: plt using ATF-chrome-pluginRegarding progressionCurves.csv, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - url: Url of the current page. SUBPAGE stands for a path. - run: Current run (linked with index of the config for WPT) - filename: Filename of the pcap - fullname: Fullname of the pcap - har_size: Size of the HAR for this experiment, - pagedata_size: Size of the page data report - pcap_size: Size of the pcap - App Byte Index (ms): Application Byte Index as computed from the har file (in the browser) - bytesIn_APP: Total bytes in as seen in the browser, - bytesIn_NET: Total bytes in as seen in the network, - X_BI_net: Network Byte Index computed from the pcap file (in the network) - X_bin_0_for_B_completion to X_bin_99_for_B_completion: X_bin_k_for_B_completion is the bytes progress reached after k*100 millisecondsIf you use these datasets in your research, you can reference to the appropriate paper:@inproceedings{qoeNetworking2020, title={Revealing QoE of Web Users from Encrypted Network Traffic}, author={Huet, Alexis and Saverimoutou, Antoine and Ben Houidi, Zied and Shi, Hao and Cai, Shengming and Xu, Jinchun and Mathieu, Bertrand and Rossi, Dario}, booktitle={2020 IFIP Networking Conference (IFIP Networking)}, year={2020}, organization={IEEE}}

Facebook

Twitter

Click to copy link

Link copied

Cite

Office of the Associate Director for Communication, Division of News and Electronic Media (2025). Monthly Page Views to CDC.gov [Dataset]. https://data.cdc.gov/Web-Metrics/Monthly-Page-Views-to-CDC-gov/rq85-buyi

Monthly Page Views to CDC.gov

Explore at:

xml, application/rdfxml, json, csv, application/rssxml, tsvAvailable download formats

Dataset updated

Jul 1, 2025

Dataset authored and provided by

Office of the Associate Director for Communication, Division of News and Electronic Media

Description

For more information on CDC.gov metrics please see http://www.cdc.gov/metrics/

Clear search

Close search

Google apps

Main menu

Monthly Page Views to CDC.gov

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Website Metrics

Data from: Google Analytics & Twitter dataset from a movies, TV series and...

Corporate Website — Analytics — Popular pages

Google Trends and Wikipedia Page Views

Data from: Analysis of the Quantitative Impact of Social Networks General...

Wikipedia Web Traffic 2018-19

Context

Content

Acknowledgements

Alexa Domains Dataset

Monthly web access to the Open Data portal

Monthly web access to the Institutional Portal

Daily web access to the Open Data portal | gimi9.com

Lead Scoring Dataset

Context

Content

Acknowledgements

Inspiration

alis.alberta.ca - Web Traffic Statistics - Catalogue - Canadian Urban Data...

Daily web access to the citizen's file

Data from: WikiReddit: Tracing Information and Attention Flows Between...

Preprint

Abstract

Datasheet

Motivation

Composition

Collection Process

Preprocessing/cleaning/labeling

Uses

Distribution

Maintenance

SQL Database Schema

Table: posts

Table: comments

Table: postlinks

Table: commentlinks

ASIC - Registered Auditor Dataset

Daily web access to the citizen's file

Data from: Detecting Degradation of Web Browsing Quality of Experience

Data from: Revealing QoE of Web Users from Encrypted Network Traffic

Monthly Page Views to CDC.govSee More Versions

Table: `posts`

Table: `comments`

Table: `postlinks`

Table: `commentlinks`

Monthly Page Views to CDC.gov