46 datasets found
  1. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  2. Google Analytics Sample

    • console.cloud.google.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=ko, Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/details/obfuscated-ga360-data/obfuscated-ga360-data?filter=solution-type%3Adataset&hl=ko
    Explore at:
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  3. Google Ads sales dataset

    • kaggle.com
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NayakGanesh007 (2025). Google Ads sales dataset [Dataset]. https://www.kaggle.com/datasets/nayakganesh007/google-ads-sales-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    NayakGanesh007
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Google Ads Sales Dataset for Data Analytics Campaigns (Raw & Uncleaned) 📝 Dataset Overview This dataset contains raw, uncleaned advertising data from a simulated Google Ads campaign promoting data analytics courses and services. It closely mimics what real digital marketers and analysts would encounter when working with exported campaign data — including typos, formatting issues, missing values, and inconsistencies.

    It is ideal for practicing:

    Data cleaning

    Exploratory Data Analysis (EDA)

    Marketing analytics

    Campaign performance insights

    Dashboard creation using tools like Excel, Python, or Power BI

    📁 Columns in the Dataset Column Name ----- -Description Ad_ID --------Unique ID of the ad campaign Campaign_Name ------Name of the campaign (with typos and variations) Clicks --Number of clicks received Impressions --Number of ad impressions Cost --Total cost of the ad (in ₹ or $ format with missing values) Leads ---Number of leads generated Conversions ----Number of actual conversions (signups, sales, etc.) Conversion Rate ---Calculated conversion rate (Conversions ÷ Clicks) Sale_Amount ---Revenue generated from the conversions Ad_Date------ Date of the ad activity (in inconsistent formats like YYYY/MM/DD, DD-MM-YY) Location ------------City where the ad was served (includes spelling/case variations) Device------------ Device type (Mobile, Desktop, Tablet with mixed casing) Keyword ----------Keyword that triggered the ad (with typos)

    ⚠️ Data Quality Issues (Intentional) This dataset was intentionally left raw and uncleaned to reflect real-world messiness, such as:

    Inconsistent date formats

    Spelling errors (e.g., "analitics", "anaytics")

    Duplicate rows

    Mixed units and symbols in cost/revenue columns

    Missing values

    Irregular casing in categorical fields (e.g., "mobile", "Mobile", "MOBILE")

    🎯 Use Cases Data cleaning exercises in Python (Pandas), R, Excel

    Data preprocessing for machine learning

    Campaign performance analysis

    Conversion optimization tracking

    Building dashboards in Power BI, Tableau, or Looker

    💡 Sample Analysis Ideas Track campaign cost vs. return (ROI)

    Analyze click-through rates (CTR) by device or location

    Clean and standardize campaign names and keywords

    Investigate keyword performance vs. conversions

    🔖 Tags Digital Marketing · Google Ads · Marketing Analytics · Data Cleaning · Pandas Practice · Business Analytics · CRM Data

  4. S

    Open Data KC Google Analytics Data

    • splitgraph.com
    Updated Dec 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kcmo (2023). Open Data KC Google Analytics Data [Dataset]. https://www.splitgraph.com/kcmo/open-data-kc-google-analytics-data-cxg5-cfac
    Explore at:
    application/vnd.splitgraph.image, application/openapi+json, jsonAvailable download formats
    Dataset updated
    Dec 29, 2023
    Authors
    kcmo
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This dataset contains basis performance data from data.kcmo.org. The data is tracked via google analytics.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  5. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    Taiwan, British Indian Ocean Territory, Canada, Moldova (Republic of), Andorra, Isle of Man, Nepal, Northern Mariana Islands, Bangladesh, Tunisia
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  6. f

    Comparison of user, site, and network-centric approaches to web analytics...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Comparison of user, site, and network-centric approaches to web analytics data collection showing advantages, disadvantages, and examples of each approach at the time of the study. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of user, site, and network-centric approaches to web analytics data collection showing advantages, disadvantages, and examples of each approach at the time of the study.

  7. My Google Data Analytics Capstone Project Data

    • kaggle.com
    Updated Feb 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madd (2022). My Google Data Analytics Capstone Project Data [Dataset]. https://www.kaggle.com/maddmonty/my-google-data-analytics-capstone-project-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 18, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Madd
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Scenario This data is for capstone I play the role of a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The Lily Monero, director of marketing, believes the company’s future success depends on maximizing the number of annual memberships.

    Business Question: "How do annual members and casual riders use Cyclistic bikes differently?" Overall Goal: Design marketing strategies aimed at converting casual riders into annual members.

    Acknowledgements

    The project is required to use a year of user data, for the capstone project for the Google data analytics course. This is a sample from a larger data set from Google (https://divvy-tripdata.s3.amazonaws.com/index.html).

    The main objective is to determine the differences betwwen members and casual users of the Cyclistic data base, then mold best marketing strategies to turn casual bike riders into annual members.?

  8. Looker Ecommerce BigQuery Dataset

    • kaggle.com
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mustafa Keser
    Description

    Looker Ecommerce Dataset Description

    CSV version of Looker Ecommerce Dataset.

    Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

    1. distribution_centers.csv

    • Columns:
      • id: Unique identifier for each distribution center.
      • name: Name of the distribution center.
      • latitude: Latitude coordinate of the distribution center.
      • longitude: Longitude coordinate of the distribution center.

    2. events.csv

    • Columns:
      • id: Unique identifier for each event.
      • user_id: Identifier for the user associated with the event.
      • sequence_number: Sequence number of the event.
      • session_id: Identifier for the session during which the event occurred.
      • created_at: Timestamp indicating when the event took place.
      • ip_address: IP address from which the event originated.
      • city: City where the event occurred.
      • state: State where the event occurred.
      • postal_code: Postal code of the event location.
      • browser: Web browser used during the event.
      • traffic_source: Source of the traffic leading to the event.
      • uri: Uniform Resource Identifier associated with the event.
      • event_type: Type of event recorded.

    3. inventory_items.csv

    • Columns:
      • id: Unique identifier for each inventory item.
      • product_id: Identifier for the associated product.
      • created_at: Timestamp indicating when the inventory item was created.
      • sold_at: Timestamp indicating when the item was sold.
      • cost: Cost of the inventory item.
      • product_category: Category of the associated product.
      • product_name: Name of the associated product.
      • product_brand: Brand of the associated product.
      • product_retail_price: Retail price of the associated product.
      • product_department: Department to which the product belongs.
      • product_sku: Stock Keeping Unit (SKU) of the product.
      • product_distribution_center_id: Identifier for the distribution center associated with the product.

    4. order_items.csv

    • Columns:
      • id: Unique identifier for each order item.
      • order_id: Identifier for the associated order.
      • user_id: Identifier for the user who placed the order.
      • product_id: Identifier for the associated product.
      • inventory_item_id: Identifier for the associated inventory item.
      • status: Status of the order item.
      • created_at: Timestamp indicating when the order item was created.
      • shipped_at: Timestamp indicating when the order item was shipped.
      • delivered_at: Timestamp indicating when the order item was delivered.
      • returned_at: Timestamp indicating when the order item was returned.

    5. orders.csv

    • Columns:
      • order_id: Unique identifier for each order.
      • user_id: Identifier for the user who placed the order.
      • status: Status of the order.
      • gender: Gender information of the user.
      • created_at: Timestamp indicating when the order was created.
      • returned_at: Timestamp indicating when the order was returned.
      • shipped_at: Timestamp indicating when the order was shipped.
      • delivered_at: Timestamp indicating when the order was delivered.
      • num_of_item: Number of items in the order.

    6. products.csv

    • Columns:
      • id: Unique identifier for each product.
      • cost: Cost of the product.
      • category: Category to which the product belongs.
      • name: Name of the product.
      • brand: Brand of the product.
      • retail_price: Retail price of the product.
      • department: Department to which the product belongs.
      • sku: Stock Keeping Unit (SKU) of the product.
      • distribution_center_id: Identifier for the distribution center associated with the product.

    7. users.csv

    • Columns:
      • id: Unique identifier for each user.
      • first_name: First name of the user.
      • last_name: Last name of the user.
      • email: Email address of the user.
      • age: Age of the user.
      • gender: Gender of the user.
      • state: State where t...
  9. S

    City Website Google Analytics

    • splitgraph.com
    • performance.ci.janesville.wi.us
    Updated Apr 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    performance-ci-janesville-wi-us (2023). City Website Google Analytics [Dataset]. https://www.splitgraph.com/performance-ci-janesville-wi-us/city-website-google-analytics-xeqs-kevn
    Explore at:
    json, application/openapi+json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Apr 21, 2023
    Authors
    performance-ci-janesville-wi-us
    Description

    The City uses Google Analytics to track data about use of the City's website.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  10. Web requests analysis of Italy websites which use Google Analytics

    • zenodo.org
    bz2, txt, zip
    Updated Aug 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federico Leva; Federico Leva (2022). Web requests analysis of Italy websites which use Google Analytics [Dataset]. http://doi.org/10.5281/zenodo.6793113
    Explore at:
    zip, bz2, txtAvailable download formats
    Dataset updated
    Aug 8, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Federico Leva; Federico Leva
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Ongoing analysis of over 650,000 websites with webbkoll. The front page for Italy-related domain names has been accessed through HTTPS or HTTP to gather data about third-party requests, cookies and other privacy-invasive features. Over 80 % of the websites in the sample appear to contain Google Analytics.

  11. S

    State of Iowa Google My Business Profile Analytics by Month

    • splitgraph.com
    • s.cnmilf.com
    • +3more
    Updated Jul 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mydata-iowa-gov (2024). State of Iowa Google My Business Profile Analytics by Month [Dataset]. https://www.splitgraph.com/mydata-iowa-gov/state-of-iowa-google-my-business-profile-analytics-ep5t-vkbi
    Explore at:
    application/openapi+json, json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Jul 9, 2024
    Authors
    mydata-iowa-gov
    Area covered
    Iowa
    Description

    This dataset provides insights by month on how people find State of Iowa agency listings on the web via Google Search and Maps, and what they do once they find it to include providing reviews (ratings), accessing agency websites, requesting directions, and making calls.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  12. d

    Foot Traffic Data | South America | Real-Time GPS Mobility & Visitation...

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irys, Foot Traffic Data | South America | Real-Time GPS Mobility & Visitation Analytics [Dataset]. https://datarade.ai/data-products/real-time-historical-mobile-location-data-gps-south-ame-irys
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sqlAvailable download formats
    Dataset authored and provided by
    Irys
    Area covered
    South America, Saint Vincent and the Grenadines, Guadeloupe, Chile, Martinique, Aruba, Ecuador, Saint Kitts and Nevis, Saint Martin (French part), Saint Barthélemy, Dominica
    Description

    This foot traffic dataset provides GPS-based mobile movement signals from across South America. It is ideal for retailers, city agencies, advertisers, and real estate professionals seeking insights into how people move through physical locations and urban spaces.

    Each record includes:

    Device ID (IDFA or GAID) Timestamps (in milliseconds and readable format) GPS coordinates (lat/lon) Country code Horizontal accuracy (85%) Optional IP address, mobile carrier, and device model

    Access the data via polygon queries (up to 10,000 tiles), and receive files in CSV, JSON, or Parquet, delivered hourly or daily via API, AWS S3, or Google Cloud. Data freshness is strong (95% delivered within 3 days), with full historical backfill available from September 2024.

    This solution supports flexible credit-based pricing and is privacy-compliant under GDPR and CCPA.

    Key Attributes:

    Custom POI or polygon query capability Backfilled GPS traffic available across LATAM High-resolution movement with daily/hourly cadence GDPR/CCPA-aligned with opt-out handling Delivery via API or major cloud platforms

    Use Cases:

    Competitive benchmarking across malls or stores Transport and infrastructure planning Advertising attribution for outdoor/DOOH campaigns Footfall modeling for commercial leases City zoning, tourism, and planning investments Telecom & tower planning across developing corridors

  13. S

    Page Views per Day

    • splitgraph.com
    • data.edmonton.ca
    Updated Jul 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Edmonton (2019). Page Views per Day [Dataset]. https://www.splitgraph.com/edmonton-ca/page-views-per-day-ms8m-g4v9
    Explore at:
    application/openapi+json, json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Jul 17, 2019
    Dataset authored and provided by
    City of Edmonton
    Description

    This dataset shows the number of page views each day of 2016 for data.edmonton.ca. This data is pulled from our Google Analytics and updated monthly.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  14. Web Analytics Market Analysis, Size, and Forecast 2025-2029: North America...

    • technavio.com
    pdf
    Updated Apr 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Web Analytics Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, Italy, and UK), APAC (China, India, Japan, and South Korea), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/web-analytics-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 29, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    Canada, United States
    Description

    Snapshot img

    Web Analytics Market Size 2025-2029

    The web analytics market size is forecast to increase by USD 3.63 billion, at a CAGR of 15.4% between 2024 and 2029.

    The market is experiencing significant growth, driven by the rising preference for online shopping and the increasing adoption of cloud-based solutions. The shift towards e-commerce is fueling the demand for advanced web analytics tools that enable businesses to gain insights into customer behavior and optimize their digital strategies. Furthermore, cloud deployment models offer flexibility, scalability, and cost savings, making them an attractive option for businesses of all sizes. However, the market also faces challenges associated with compliance to data privacy and regulations. With the increasing amount of data being generated and collected, ensuring data security and privacy is becoming a major concern for businesses.
    Regulatory compliance, such as GDPR and CCPA, adds complexity to the implementation and management of web analytics solutions. Companies must navigate these challenges effectively to maintain customer trust and avoid potential legal issues. To capitalize on market opportunities and address these challenges, businesses should invest in robust web analytics solutions that prioritize data security and privacy while providing actionable insights to inform strategic decision-making and enhance customer experiences.
    

    What will be the Size of the Web Analytics Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free Sample

    The market continues to evolve, with dynamic market activities unfolding across various sectors. Entities such as reporting dashboards, schema markup, conversion optimization, session duration, organic traffic, attribution modeling, conversion rate optimization, call to action, content calendar, SEO audits, website performance optimization, link building, page load speed, user behavior tracking, and more, play integral roles in this ever-changing landscape. Data visualization tools like Google Analytics and Adobe Analytics provide valuable insights into user engagement metrics, helping businesses optimize their content strategy, website design, and technical SEO. Goal tracking and keyword research enable marketers to measure the return on investment of their efforts and refine their content marketing and social media marketing strategies.

    Mobile optimization, form optimization, and landing page optimization are crucial aspects of website performance optimization, ensuring a seamless user experience across devices and improving customer acquisition cost. Search console and page speed insights offer valuable insights into website traffic analysis and help businesses address technical issues that may impact user behavior. Continuous optimization efforts, such as multivariate testing, data segmentation, and data filtering, allow businesses to fine-tune their customer journey mapping and cohort analysis. Search engine optimization, both on-page and off-page, remains a critical component of digital marketing, with backlink analysis and page authority playing key roles in improving domain authority and organic traffic.

    The ongoing integration of user behavior tracking, click-through rate, and bounce rate into marketing strategies enables businesses to gain a deeper understanding of their audience and optimize their customer experience accordingly. As market dynamics continue to evolve, the integration of these tools and techniques into comprehensive digital marketing strategies will remain essential for businesses looking to stay competitive in the digital landscape.

    How is this Web Analytics Industry segmented?

    The web analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Deployment
    
      Cloud-based
      On-premises
    
    
    Application
    
      Social media management
      Targeting and behavioral analysis
      Display advertising optimization
      Multichannel campaign analysis
      Online marketing
    
    
    Component
    
      Solutions
      Services
    
    
    Geography
    
      North America
    
        US
        Canada
    
    
      Europe
    
        France
        Germany
        Italy
        UK
    
    
      APAC
    
        China
        India
        Japan
        South Korea
    
    
      Rest of World (ROW)
    

    .

    By Deployment Insights

    The cloud-based segment is estimated to witness significant growth during the forecast period.

    In today's digital landscape, web analytics plays a pivotal role in driving business growth and optimizing online performance. Cloud-based deployment of web analytics is a game-changer, enabling on-demand access to computing resources for data analysis. This model streamlines business intelligence processes by collecting, integra

  15. Google Store Ecommerce Data + Fake Retail Data

    • kaggle.com
    Updated Apr 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jean-Philippe Allard (2019). Google Store Ecommerce Data + Fake Retail Data [Dataset]. https://www.kaggle.com/jpallard/google-store-ecommerce-data-fake-retail-data/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 5, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jean-Philippe Allard
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset has been created for an example of implementing predictive modeling in a dashboard.

    Content

    The online.csv file contains actual order data manually imported from the Google Store public access Google Analytics. This data can't be accessed via API, unfortunately.

    The retail.csv is a heavily modified version of the UK retailer dataset, to approximate a retail location using another kind of POS for the Google store.

    The KEY_SKU.csv file is the link between stock codes and product skus the permit joining the files.

    The Marketing_Spend.csv file is a fake file containing marketing budgets for online and offline advertising. It was created to practice building a model predicting sales from the marketing budget.

    Inspiration

    Have fun!

  16. H

    Love Matters Sample Website Data 2018

    • dataverse.harvard.edu
    Updated May 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lindsay van Clief (2019). Love Matters Sample Website Data 2018 [Dataset]. http://doi.org/10.7910/DVN/WXUOA2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 22, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Lindsay van Clief
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a sampling of google analytic data from the Love Matters websites in India, Mexico, Kenya, Nigeria, and Egypt. Love Matters is a program of RNW Media (www.rnw.org)

  17. d

    Global Technographic Data | Private Companies | Worldwide

    • datarade.ai
    Updated Aug 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veridion (2025). Global Technographic Data | Private Companies | Worldwide [Dataset]. https://datarade.ai/data-products/global-technographic-data-private-companies-worldwide-veridion
    Explore at:
    .json, .xml, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Aug 13, 2025
    Dataset authored and provided by
    Veridion
    Area covered
    Portugal, Cayman Islands, San Marino, Slovenia, Moldova (Republic of), New Caledonia, United States of America, Haiti, Saint Martin (French part), Bahamas
    Description

    Veridion’s technographic dataset delivers deterministic, verified insights into the technology stacks that power a company’s digital presence. This is not modeled or probabilistic data — it is extracted from first-party, real-world digital signals sourced from company websites, social media, press releases, and other online assets. The technographic layer is part of Veridion’s broader company profile, which also includes firmographics, business activities, products & services, ESG attributes, ownership structures, and location data. When combined, these layers allow clients to gain a deep, multi-dimensional understanding of both public and private companies across 245+ countries. At the core, technographic data answers critical questions such as: - What software, platforms, and tools does this company use to operate online? - Which content management systems (CMS), analytics tools, marketing automation platforms, payment gateways, or hosting services are in place? - What industry-specific applications or integrations signal the company’s operational maturity or market positioning? - How does the company’s tech adoption compare to competitors or peers in its sector?

    Data Sources & Collection Methodology Veridion’s approach to technographic intelligence is built for scale, frequency, and accuracy: - First-Party Digital Footprint Analysis – Veridion’s crawlers scan billions of web pages weekly, capturing up-to-date, verifiable signals from a company’s active online properties. - Multi-Source Validation – Detected technologies are cross-referenced with multiple independent sources, including metadata in site code, integrations disclosed in press releases, and verified vendor references. - Granular Taxonomy – Technologies are classified into structured categories for easy integration with customer workflows — for example: ◦ Web Hosting & Infrastructure (e.g., AWS, Azure, Google Cloud) ◦ Web Development Frameworks (e.g., React, Angular, Vue.js) ◦ Content Management Systems (e.g., WordPress, Shopify, Drupal) ◦ Analytics & BI Tools (e.g., Google Analytics, Mixpanel, Power BI) ◦ Marketing Automation & CRM (e.g., HubSpot, Salesforce, Marketo) ◦ Payment Gateways & E-commerce Platforms (e.g., Stripe, Magento) ◦ Industry-Specific Tools (e.g., hotel booking engines, telehealth platforms) - Weekly Updates: Because digital tech stacks evolve quickly, Veridion refreshes profiles weekly to detect changes, new adoptions, or deprecations. This ensures technographic data reflects the current state, not stale historical footprints.

    Core Features - Deterministic Detection – Identified technologies are based on confirmed signals, not statistical guesses. - Global Scale – Coverage of 130M+ operating companies in over 245 countries, including hard-to-find SMBs. - Granular Categorization – Technologies classified into operationally relevant groups to support segmentation and targeting. - Time-Series Tracking – Ability to see when a technology was first detected and track its lifecycle within the company profile. - Integrations Ready – Data is available via API, batch delivery, or through Veridion’s Data Discovery Platform for direct integration into CRM, MDM, ABM, or analytics tools.

    Technographic Data Use Cases

    1. Sales & Marketing Segmentation Challenge: Go-to-market teams often waste resources targeting broad, undifferentiated segments without knowing which prospects are actually a fit for their solution. Solution with Veridion: Filter and prioritize prospects based on the technologies they use, for example: • SaaS providers targeting companies that use complementary technologies (e.g., selling an SEO tool to companies already using HubSpot or WordPress). • Competitor displacement campaigns targeting companies running a rival product. • Market entry campaigns identifying verticals with high adoption of a given platform. Impact: Increased conversion rates, higher ROI on outbound campaigns, and reduced sales cycle length.

    2. Competitive Intelligence Challenge: Companies lack visibility into competitors’ penetration across markets or accounts. Solution with Veridion: Build competitive landscapes by mapping where specific technologies are deployed. Track adoption trends over time to identify market share shifts or early signs of competitive threats.

    3. Account-Based Marketing (ABM) Enrichment Challenge: ABM strategies rely on deep account intelligence, yet most CRM data is incomplete or outdated. Solution with Veridion: Enrich target accounts with verified technographics to personalize messaging, content, and offers.

    4. Partner & Channel Ecosystem Mapping Challenge: Partner managers need to find integrators, agencies, and resellers that work with specific technologies. Solution with Veridion: Use technographic filters to identify potential partners already experienced in the target technology ecosystem.

    5. Market Sizing & Opportunity Analysis Challenge: Pr...

  18. n

    Repository Analytics and Metrics Portal (RAMP) 2018 data

    • data.niaid.nih.gov
    • dataone.org
    • +2more
    zip
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2018 data [Dataset]. http://doi.org/10.5061/dryad.ffbg79cvp
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2018. For a description of the data collection, processing, and output methods, please see the "methods" section below. Note that the RAMP data model changed in August, 2018 and two sets of documentation are provided to describe data collection and processing before and after the change.

    Methods

    RAMP Data Documentation – January 1, 2017 through August 18, 2018

    Data Collection

    RAMP data were downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

    The data in these CSV files include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data follow the format 2018-01_RAMP_all.csv. Using this example, the file 2018-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2018.

    Data Collection from August 19, 2018 Onward

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

    The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

    Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository

  19. S

    Weekly Fort Collins OpenData Portal Analytics

    • splitgraph.com
    • opencity.fcgov.com
    • +1more
    Updated Jul 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opendata-fcgov (2024). Weekly Fort Collins OpenData Portal Analytics [Dataset]. https://www.splitgraph.com/opendata-fcgov/weekly-fort-collins-opendata-portal-analytics-it47-5a8q/
    Explore at:
    application/openapi+json, json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Jul 14, 2024
    Authors
    opendata-fcgov
    Area covered
    Fort Collins
    Description

    Weekly data from the Google Analytics tag for the Open Data Portal at OpenData.fcgov.com.

    Analytics shown are presumed to be non-City-employees, as these data come from computers external to the City network. Each day starting at the first day for which there are data is included, and the URL is either a specific page or "all", specifying that every page in the domain is included. Specific-page URLs are filtered to the main Portal page or data assets, so "all" may capture more pages than specified individually.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  20. n

    Data from: Repository Analytics and Metrics Portal (RAMP) 2021 data

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated May 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2023). Repository Analytics and Metrics Portal (RAMP) 2021 data [Dataset]. http://doi.org/10.5061/dryad.1rn8pk0tz
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2021. For a description of the data collection, processing, and output methods, please see the "methods" section below.

    The record will be revised periodically to make new data available through the remainder of 2021.

    Methods

    Data Collection

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

    The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

    Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.

    As a result, two CSV datasets are provided for each month of published data:

    page-clicks:

    The data in these CSV files correspond to the page-level data, and include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “page-clicks”. For example, the file named 2021-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2021.

    country-device-info:

    The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    index: The Elasticsearch index corresponding to country and device access information data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “country-device-info”. For example, the file named 2021-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2021.

    References

    Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Organization logoOrganization logo

Google Analytics Sample

Google Analytics Sample (BigQuery)

Explore at:
19 scholarly articles cite this dataset (View in Google Scholar)
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?

Search
Clear search
Close search
Google apps
Main menu