For more information on CDC.gov metrics please see http://www.cdc.gov/metrics/
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.
The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:
Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.
Fork this kernel to get started.
Banner Photo by Edho Pratama from Unsplash.
What is the total number of transactions generated per device browser in July 2017?
The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?
What was the average number of product pageviews for users who made a purchase in July 2017?
What was the average number of product pageviews for users who did not make a purchase in July 2017?
What was the average total transactions per user that made a purchase in July 2017?
What is the average amount of money spent per session in July 2017?
What is the sequence of pages viewed?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Abstract (our paper)
The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends.
Data
personal-name.txt.gz:
The first column is the Wikipedia article id, the second column is the search keyword, the third column is the Wikipedia article title, and the fourth column is the total of page views from 2008 to 2014.
personal-name_data_google-trends.txt.gz, personal-name_data_wikipedia.txt.gz:
The first column is the period to be collected, the second column is the source (Google or Wikipedia), the third column is the Wikipedia article id, the fourth column is the search keyword, the fifth column is the date, and the sixth column is the value of search trend or page view.
Publication
This data set was created for our study. If you make use of this data set, please cite:
Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015.
http://dx.doi.org/10.1145/2786451.2786495
http://arxiv.org/abs/1509.02218 (author-created version)
Note
The raw data of Wikipedia page views is available in the following page.
http://dumps.wikimedia.org/other/pagecounts-raw/
A. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc).
B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process.
C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL.
D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal.
Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.
This dataset includes data on how all datasets, stories and derived views (tabular views, visualizations and measures) on a domain are being accessed by users.
The following usage types are included in the Access Type column:Data are updated by a system process at least once a day.
Please see Site Analytics: Asset Access for more detail.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Monthly analytics reports for the Brisbane City Council website
Information regarding the sessions for Brisbane City Council website during the month including page views and unique page views.
The Asset Access dataset manages the metadata on how users access all datasets, stories, and derived views (tabular views, story page views, visualization page views and measures page views.) The data provides two key metrics: user's usage types and user segments.
Access counts for Open Data Portal (assets where the URL includes the domain data.mesaaz.gov)
This dataset includes data on how all datasets, stories and derived views (tabular views, visualizations and measures) on a domain are being accessed by users.
The following usage types are included in the Access Type column:Data are updated by a system process at least once a day.
Please see Site Analytics: Asset Access for more detail.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Austin's data portal activity metrics’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/data-portal-activity-metricse on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Background
Austin's open data portal provides lots of public data about the City of Austin. It also provides portal administrators with behind-the-scenes information about how the portal is used... but that data is mysterious, hard to handle in a spreadsheet, and not located all in one place.
Until now! Authorized city staff used admin credentials to grab this usage data and share it the public. The City of Austin wants to use this data to inform the development of its open data initiative and manage the open data portal more effectively.
This project contains related datasets for anyone to explore. These include site-level metrics, dataset-level metrics, and department information for context. A detailed detailed description of how the files were prepared (along with code) can be found on github here.
Example questions to answer about the data portal
- What parts of the open data portal do people seem to value most?
- What can we tell about who our users are?
- How are our data publishers doing?
- How much data is published programmatically vs manually?
- How data is super fresh? Super stale?
- Whatever you think we should know...
About the files
all_views_20161003.csv
There is a resource available to portal administrators called "Dataset of datasets". This is the export of that resource, and it was captured on Oct 3, 2016. It contains a summary of the assets available on the data portal. While this file contains over 1400 resources (such as views, charts, and binary files), only 363 are actual tabular datasets.
table_metrics_ytd.csv
This file contains information about the 363 tabular datasets on the portal. Activity metrics for an individual dataset can be accessed by calling Socrata's views/metrics API and passing along the dataset's unique ID, a time frame, and admin credentials. The process of obtaining the 363 identifiers, calling the API, and staging the information can be reviewed in the python notebook here.
site_metrics.csv
This file is the export of site-level stats that Socrata generates using a given time frame and grouping preference. This file contains records about site usage each month from Nov 2011 through Sept 2016. By the way, it contains 285 columns... and we don't know what many of them mean. But we are determined to find out!! For a preliminary exploration of the columns and what portal-related business processes to which they might relate, check out the notes in this python notebook here
city_departments_in_current_budget.csv
This file contains a list of all City of Austin departments according to how they're identified in the most recently approved budget documents. Could be helpful for getting to know more about who the publishers are.
crosswalk_to_budget_dept.csv
The City is in the process of standardizing how departments identify themselves on the data portal. In the meantime, here's a crosswalk from the department values observed in
all_views_20161003.csv
to the department names that appear in the City's budgetThis dataset was created by Hailey Pate and contains around 100 samples along with Di Sync Success, Browser Firefox 19, technical information and other features such as: - Browser Firefox 33 - Di Sync Failed - and more.
- Analyze Sf Query Error User in relation to Js Page View Admin
- Study the influence of Browser Firefox 37 on Datasets Created
- More datasets
If you use this dataset in your research, please credit Hailey Pate
--- Original source retains full ownership of the source dataset ---
The dataset contains information, divided by day, on the accesses made to the online services offered by the opendata portal and provided by the municipality of Milan. The pageviews column represents the total number of web pages, which have been displayed within the time frame used. The visits column represents the total number of visits made, within the time frame used. The visitors column represents the total number of unique visitors who have accessed the web pages. By unique visitor, we mean a visitor counted only once within the time frame used.
The dataset contains information, divided by day, on the accesses made to the online services offered by the citizen's file and provided by the municipality of Milan. The pageviews column represents the total number of web pages, which have been displayed within the time frame used. The visitors column represents the total number of unique visitors who have accessed the web pages. By unique visitor, we mean a visitor counted only once within the time frame used.
Information about accesses (visits) of city data assets. Combines analytics from both employee (citydata.mesaaz.gov) and public data (data.mesaaz.gov) portals.
The following usage types are included in the Access Type column: grid view – tabular view of the dataset / filtered view primer page view – dataset / filtered view’s homepage, includes metadata and table preview of the data download – download of the dataset / filtered view to CSV, JSON, etc. api read access – programmatic access of dataset/filtered vew, etc. story page view – accessing a story page asset visualization page view – accessing a chart or map asset measure page view – accessing a performance measure asset
Usage data are segmented into the following user types: site member: users who have logged in and have been granted a role on the domain community user: users who have logged in but do not have a role on the domain anonymous: users who have not logged in to the domain Data are updated by a system process at least once a day.
Please see Site Analytics: Asset Access for more detail.
https://brightdata.com/licensehttps://brightdata.com/license
Access our extensive Facebook datasets that provide detailed information on public posts, pages, and user engagement. Gain insights into post performance, audience interactions, page details, and content trends with our ethically sourced data. Free samples are available for evaluation. Over 940M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Post ID Post Content & URL Date Posted Hashtags Number of Comments Number of Shares Likes & Reaction Counts (by type) Video View Count Page Name & Category Page Followers & Likes Page Verification Status Page Website & Contact Info Is Sponsored Post Attachments (Images/Videos) External Link Data And much more
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains information, divided by day, on accesses made to the online services offered by the opendata portal and provided by the municipality of Milan. The pageviews column represents the total number of web pages that have been viewed within the time frame used. The visits column represents the total visits made, within the time frame used. The visitors column represents the total number of unique visitors who have accessed the web pages. By unique visitor, we mean a visitor counted only once within the time frame used.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains information, divided by month, on accesses made to the online services offered by the institutional portal and provided by the municipality of Milan. The pageviews column represents the total number of web pages that have been viewed within the time frame used. The visits column represents the total visits made, within the time frame used. The visitors column represents the total number of unique visitors who have accessed the web pages. By unique visitor, we mean a visitor counted only once within the time frame used.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports the Biomarker: Vector-Borne Viruses page on the Tempe Wastewater BioIntel Program site.Wastewater collection areas are comprised of merged sewage drainage basins that flow to a shared testing location for the Tempe Wastewater BioIntel Program. The wastewater collection areas represent a geographic area for which virus activity is tested. People infected with a virus excrete the virus in their feces in a process known as “shedding”. The municipal wastewater treatment system (sewage system) collects and aggregates these bathroom contributions across communities. The process begins at sampling site where, over a period of 24 hours, a wastewater sample is collected along the sewer line. After the sample is acquired, it is immediately transferred to a lab where scientists prepare the sample. The laboratory analysis seeks to determine if there is a signal (or detectable presence) of the biomarker in the wastewater. Please see the Tempe Wastewater BioIntel Program site for more information on the wastewater testing process at https://wastewater.tempe.gov/. About the data: These data illustrate a trend of the signal of the weekly average or weekly results of Tempe wastewater biomarker groups. The dashboard and collection area map do not depict the number of individuals infected. Each collection area includes at least one sampling location, which collects wastewater from across the collection area. It does not reflect the specific location where the deposit occurs. While testing can successfully quantify the results, research has not yet determined the relationship between these values and the number of people who are contributing to the signals. The influence of this data on community health decisions in the future is unknown. Data collection is being used to depict overall weekly trends and should not be interpreted without a holistic assessment of public health data. The purpose of this weekly data is to support research as well as to identify overall trends of the genome copies in each liter of wastewater per collection area. We share this information with the public with the disclaimer that only the future can tell how much “diagnostic value” we can and should attribute to the numeric measurements we obtain from the sewer. However, we know what we measure is real and we share that info with our community. Data are shared as the testing results become available. As results may not be released at the same time, testing results for each area may not yet be seen for a given day or week. The dashboard presents the weekly averages. Data are collected from 2-7 days per week. For Collection Area 1, Tempe's wastewater co-mingles with wastewater from a regional sewage line. Tempe's sewage makes up most of Collection Area 1 samples. For Collection Area 3, Tempe's wastewater co-mingles with wastewater from a regional sewage line. For analysis and reporting, Tempe’s wastewater is separated from regional sewage. Week start date represents the starting date of the testing week, which starts on Mondays and ends on Sundays. Additional Information:Source: The Translational Genomics Research Institute (TGen), part of City of Hope, is an Arizona-based, nonprofit medical research institute.Contact: Kimberly SoteloContact email: kimberly_sotelo@tempe.govPreparation Method: Initial values are provided by TGen. Tempe makes additional calculations to determine the weekly averages or weekly results for each biomarker.Publish Frequency: Weekly or as data becomes availablePublish Method: ManualData Dictionary
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains information, divided by month, on accesses made to online services offered by the citizen's file and provided by the municipality of Milan. The pageviews column represents the total number of web pages that have been viewed within the time frame used. The visitors column represents the total number of unique visitors who have accessed the web pages. By unique visitor, we mean a visitor counted only once within the time frame used.
Open Jackson is the City of Jackson's open data portal to find facts, figures, and maps related to our lives within the city. We are working to make this the default technology platform to support the publication of the City's public information, in the form of data, and to make this information easy to find, access, and use by a broad audience. The release of Open Jackson marks the culminating point of our efforts to transition to a transparent government. We will continue our work to curate high-quality and up-to-date datasets and develop a platform that is widely accessible. If you have feedback, please contact [email protected]. In 2015, a new law created the online open data portal to increase transparency and accountability in Jackson by making key information easily accessible and usable to both city officials and citizens. Click here to view the Jackson Open Data Policy. You may use the search bar at the top of the page to find data. Once you find a dataset you would like to download, select the data and view the available download options. Datasets can also be filtered to display only the features of the dataset that you are interested in for download. Data is offered for download in several formats. Spatial and tabular data formats (CSV, KML, shapefile, and JSON) are available for use in GIS and other applications. Mobile users may require additional software to view downloaded data. To edit a shapefile on an iOS device, users will need to unzip the file with an app such as iZip and then open the file in a viewer/editor such as iGIS. By using data made available through this site, the user agrees to all the conditions stated in the following paragraphs as well as the terms and conditions described under the City of Jackson homepage. The data made available has been modified for use from its original source, which is the City of Jackson. The City of Jackson makes no claims as to the completeness, accuracy, timeliness, or content of any data contained in this application; makes no representation of any kind, including, but not limited to, warranty of the accuracy or fitness for a particular use; nor are any such warranties to be implied or inferred with respect to the information or data furnished herein. The data is subject to change as modifications and updates are complete. It is understood that the information contained in the site is being used at one's own risk. The City of Jackson reserves the right to discontinue providing any or all of the data feeds at any time and to require the termination of any and all displaying, distributing or otherwise using any or all of the data for any reason including, without limitation, your violation of any provision of these Terms of Use. If you have questions, suggestions, requests or any other feedback, please contact or email at [email protected]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains a count of pageviews to the English-language Wikipedia from 2015-03-16T00:00:00 to 2015-04-25T15:59:59, grouped by timestamp (down to a one-second resolution level) and site (mobile or desktop). The smallest number of events in a group is 645; because of this, we are confident there should not be privacy implications of releasing this data.
For more information on CDC.gov metrics please see http://www.cdc.gov/metrics/