25 datasets found
  1. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  2. Google Analytics Sample

    • console.cloud.google.com
    Updated Jul 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=ko&inv=1&invt=AbzlTA (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=ko
    Explore at:
    Dataset updated
    Jul 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  3. sample user merkle sokrati

    • kaggle.com
    Updated Jul 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tignangshu Chatterjee (2021). sample user merkle sokrati [Dataset]. https://www.kaggle.com/tignangshuchatterjee/sample-user-merkle-sokrati/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 23, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tignangshu Chatterjee
    Description

    Dataset

    This dataset was created by Tignangshu Chatterjee

    Contents

  4. d

    Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    Isle of Man, Taiwan, Nepal, Bangladesh, British Indian Ocean Territory, Andorra, Tunisia, Northern Mariana Islands, Moldova (Republic of), Canada
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  5. warehouse_orders.warehouse

    • kaggle.com
    Updated Aug 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Lai-Mei Fu (2023). warehouse_orders.warehouse [Dataset]. https://www.kaggle.com/datasets/almlmf/warehouse-orderswarehouse
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Amy Lai-Mei Fu
    Description

    Sample data from the Google Data Analytics Professional Certificate course.

  6. Cyclistic-Google Data Analytics _Case Study 1

    • kaggle.com
    Updated Jun 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sivasumitha Padmanathan (2024). Cyclistic-Google Data Analytics _Case Study 1 [Dataset]. https://www.kaggle.com/datasets/sivasumitha/cyclistic-google-data-analytics-case-study-1/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sivasumitha Padmanathan
    Description

    ABOUT DATASET - The dataset contains historical trip data from Cyclistic, which is being used to analyze trends and understand how a bike-share company can achieve rapid success. This dataset is one of the samples used for the Google Data Analytics Professional Certificate capstone project on Coursera. - This data belongs to Cyclistic, a bike-share company in Chicago. These data are used for learning purposes. - In order to answer the business questions, follow the steps of the data analysis process: - ******Ask - ******Prepare - ******Process - ******Analyze - ******Share - ******Act

    ABOUT THE CASE STUDY- How does a bike-share navigate speedy success? - The Introduction of the case in this document - The case study road map in this document - To understand the data in the file, click Column Description - To understand an overview of this project, click Report

    Remark: If you are planning on using Posit’s RStudio, use the Divvy 2019 Q1 and Divvy 2020 Q1 datasets. Choosing other data might lead to errors because the data exceeds the memory available in the free plan.) On using Posit’s Rstudio, the column description for Divvy_Trips_2019_Q1.csv and Divvy_Trips_2020_Q1.csv.

  7. d

    Advancing Open and Reproducible Water Data Science by Integrating Data...

    • search.dataone.org
    • hydroshare.org
    • +1more
    Updated Feb 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery S. Horsburgh (2024). Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository [Dataset]. https://search.dataone.org/view/sha256%3Aade7f09cb2796fb2ad4e1a0f4b303d3a908e6ffc2f3a11ed7162d825f31bf864
    Explore at:
    Dataset updated
    Feb 3, 2024
    Dataset provided by
    Hydroshare
    Authors
    Jeffery S. Horsburgh
    Description

    Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.

    This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/

  8. Gallup Analytics

    • archive.ciser.cornell.edu
    Updated Feb 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gallup Organization (2024). Gallup Analytics [Dataset]. https://archive.ciser.cornell.edu/studies/2823
    Explore at:
    Dataset updated
    Feb 15, 2024
    Dataset provided by
    Gallup, Inc.http://gallup.com/
    Authors
    Gallup Organization
    Variables measured
    Individual
    Description

    Contains Gallup data from countries that are home to more than 98% of the world's population through a state-of-the-art Web-based portal. Gallup Analytics puts Gallup's best global intelligence in users' hands to help them better understand the strengths and challenges of the world's countries and regions. Users can access Gallup's U.S. Daily tracking and World Poll data to compare residents' responses region by region and nation by nation to questions on topics such as economic conditions, government and business, health and wellbeing, infrastructure, and education.

    The Gallup Analytics Database is accessed through the Cornell University Libraries here. In addition, a CUL subscription also allows access to the Gallup Respondent Level Data. For access please refer to the documentation below and then request the variables you need here.

    Before requesting data from the World Poll, please see the Getting Started guide and the Worldwide Research Methodology and Codebook (You will need to request access). The Codebook will give you information about all available variables in the datasets. There are other guides available as well in the google folder. You can also access information about questions asked and variables using the Gallup World Poll Reference Tool. You will need to create your user account to access the tool. This will only give you access to information about the questions asked and variables. It will not give you access to the data.

    For further documentation and information see this site from New York University Libraries. The Gallup documentation for the World Poll methodology is also available under the Data and Documentation tab.

    In addition to the World Poll and Daily Tracking Poll, also available are the Gallup Covid-19 Survey, Gallup Poll Social Series Surveys, Race Relations Survey, Confidence in Institutions Survey, Honesty and Ethics in Professions Survey, and Religion Battery.

    The process for getting access to respondent-level data from the Gallup U.S. Daily Tracking is similar to the World Poll Survey. There is no comparable discovery tool for U.S. Daily Tracking poll questions, however. Users need to consult the codebooks and available variables across years.

    The COVID-19 web survey began on March 13, 2020 with daily random samples of U.S. adults, aged 18 and older who are members of the Gallup Panel. Before requesting data, please see the Gallup Panel COVID-19 Survey Methodology and Codebook.

    The Gallup Poll Social Series (GPSS) dataset is a set of public opinion surveys designed to monitor U.S. adults’ views on numerous social, economic, and political topics. More information is available on the Gallup website: https://www.gallup.com/175307/gallup-poll-social-series-methodology.aspx As each month has a unique codebook, contact CCSS-ResearchSupport@cornell.edu to discuss your interests and start the data request process.

    Starting in 1973, Gallup started measuring the confidence level in several US institutions like Congress, Presidency, Supreme Court, Police, etc. The included dataset includes data beginning in 1973 and data is collected once per year. Users should consult the list of available variables.

    The Race Relations Poll includes topics that were previously represented in the GPSS Minority Relations Survey that ran through 2016. The Race Relations Survey was conducted November 2018. Users should consult the codebook for this poll before making their request.

    The Honesty and Ethics in Professions Survey – Starting in 1976, Gallup started measuring US perceptions of the honesty and ethics of a list of professions. The included dataset was added to the collection in March 2023 and includes data ranging from 1976-2022. Documentation for this collection is located here and will require you to request access.

    Religion Battery: Consolidated list of items focused on religion in the US from 1999-2022. Documentation for this collection is located here and will require you to request access.

  9. Dogecoin Cryptocurrency Dataset

    • console.cloud.google.com
    Updated Jun 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Dogecoin&inv=1&invt=Abzqsw (2019). Dogecoin Cryptocurrency Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/dogecoin/crypto-dogecoin
    Explore at:
    Dataset updated
    Jun 10, 2019
    Dataset provided by
    Googlehttp://google.com/
    Description

    Dogecoin is an open source peer-to-peer digital currency, favored by Shiba Inus worldwide. It is qualitatively more fun while being technically nearly identical to its close relative Bitcoin. This dataset contains the blockchain data in their entirety, pre-processed to be human-friendly and to support common use cases such as auditing, investigating, and researching the economic and financial properties of the system. This dataset is part of a larger effort to make cryptocurrency data available in BigQuery through the Google Cloud Public Datasets program . The program is hosting several cryptocurrency datasets, with plans to both expand offerings to include additional cryptocurrencies and reduce the latency of updates. You can find these datasets by searching "cryptocurrency" in GCP Marketplace. For analytics interoperability, we designed a unified schema that allows all Bitcoin-like datasets to share queries. Interested in learning more about how the data from these blockchains were brought into BigQuery? Looking for more ways to analyze the data? Check out the Google Cloud Big Data blog post and try the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  10. d

    BestPlace: POI Dataset, GIS Database, Census data for Retail CPG & FMCG...

    • datarade.ai
    Updated Sep 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BestPlace (2023). BestPlace: POI Dataset, GIS Database, Census data for Retail CPG & FMCG analytics [Dataset]. https://datarade.ai/data-products/bestplace-poi-dataset-gis-database-census-data-for-retail-bestplace
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Sep 8, 2023
    Dataset authored and provided by
    BestPlace
    Area covered
    Tunisia, Mongolia, Cameroon, Isle of Man, United Kingdom, Taiwan, Nicaragua, Morocco, Latvia, Israel
    Description

    BestPlace is an innovative retail data and analytics tool created explicitly for medium and enterprise-level CPG/FMCG companies. It's designed to revolutionize your retail data analysis approach by adding a strategic location-based perspective to your existing database. This perspective enriches your data landscape and allows your business to understand better and cater to shopping behavior. An In-Depth Approach to Retail Analytics Unlike conventional analytics tools, BestPlace delves deep into each store location details, providing a comprehensive analysis of your retail database. We leverage unique tools and methodologies to extract, analyze, and compile data. Our processes have been accurately designed to provide a holistic view of your business, equipping you with the information you need to make data-driven data-backed decisions. Amplifying Your Database with BestPlace At BestPlace, we understand the importance of a robust and informative retail database design. We don't just add new stores to your database; we enrich each store with vital characteristics and factors. These enhancements come from open cartographic sources such as Google Maps and our proprietary GIS database, all carefully collected and curated by our experienced data analysts. Store Features We enrich your retail database with an array of store features, which include but are not limited to: Number of reviews Average ratings Operational hours Categories relevant to each point Our attention to detail ensures your retail database becomes a powerful tool for understanding customer interactions and preferences.

    Extensive Use Cases BestPlace's capabilities stretch across various applications, offering value in areas such as: Competition Analysis: Identify your competitors, analyze their performance, and understand your standing in the market with our extensive POI database and retail data analytics capabilities. New Location Search: Use our rich retail store database to identify ideal locations for store expansions based on foot traffic data, proximity to key points, and potential customer demographics.

  11. n

    Repository Analytics and Metrics Portal (RAMP) 2020 data

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jul 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2020 data [Dataset]. http://doi.org/10.5061/dryad.dv41ns1z4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 23, 2021
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Version update: The originally uploaded versions of the CSV files in this dataset included an extra column, "Unnamed: 0," which is not RAMP data and was an artifact of the process used to export the data to CSV format. This column has been removed from the revised dataset. The data are otherwise the same as in the first version.

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2020. For a description of the data collection, processing, and output methods, please see the "methods" section below.

    Methods Data Collection

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

    The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

    Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.

    As a result, two CSV datasets are provided for each month of published data:

    page-clicks:

    The data in these CSV files correspond to the page-level data, and include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “page-clicks”. For example, the file named 2020-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2020.

    country-device-info:

    The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    index: The Elasticsearch index corresponding to country and device access information data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “country-device-info”. For example, the file named 2020-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2020.

    References

    Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.

  12. COVID-19 Community Mobility Reports

    • google.com
    • google.com.tr
    • +6more
    csv, pdf
    Updated Oct 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2022). COVID-19 Community Mobility Reports [Dataset]. https://www.google.com/covid19/mobility/
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Oct 17, 2022
    Dataset provided by
    Google Searchhttp://google.com/
    Googlehttp://google.com/
    Authors
    Google
    Description

    As global communities responded to COVID-19, we heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps would be helpful as they made critical decisions to combat COVID-19. These Community Mobility Reports aimed to provide insights into what changed in response to policies aimed at combating COVID-19. The reports charted movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.

  13. n

    Repository Analytics and Metrics Portal (RAMP) 2018 data

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2018 data [Dataset]. http://doi.org/10.5061/dryad.ffbg79cvp
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2018. For a description of the data collection, processing, and output methods, please see the "methods" section below. Note that the RAMP data model changed in August, 2018 and two sets of documentation are provided to describe data collection and processing before and after the change.

    Methods

    RAMP Data Documentation – January 1, 2017 through August 18, 2018

    Data Collection

    RAMP data were downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

    The data in these CSV files include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data follow the format 2018-01_RAMP_all.csv. Using this example, the file 2018-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2018.

    Data Collection from August 19, 2018 Onward

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

    The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

    Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository

  14. Bitcoin Cash Cryptocurrency Dataset

    • console.cloud.google.com
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Bitcoin%20Cash&hl=ko&inv=1&invt=AbyoYA (2023). Bitcoin Cash Cryptocurrency Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/bitcoin-cash/crypto-bitcoin-cash?hl=ko
    Explore at:
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    Bitcoin Cash is a cryptocurrency that allows more bytes to be included in each block relative to it’s common ancestor Bitcoin. This dataset contains the blockchain data in their entirety, pre-processed to be human-friendly and to support common use cases such as auditing, investigating, and researching the economic and financial properties of the system. This dataset is part of a larger effort to make cryptocurrency data available in BigQuery through the Google Cloud Public Datasets program . The program is hosting several cryptocurrency datasets, with plans to both expand offerings to include additional cryptocurrencies and reduce the latency of updates. You can find these datasets by searching "cryptocurrency" in GCP Marketplace. For analytics interoperability, we designed a unified schema that allows all Bitcoin-like datasets to share queries. Interested in learning more about how the data from these blockchains were brought into BigQuery? Looking for more ways to analyze the data? Check out the Google Cloud Big Data blog post and try the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  15. Non Relational Sql Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Non Relational Sql Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/non-relational-sql-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Oct 3, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Non-Relational SQL Market Outlook



    The Non-Relational SQL market size is projected to grow from USD 4.7 billion in 2023 to USD 15.8 billion by 2032, at a compound annual growth rate (CAGR) of 14.5% during the forecast period. This significant growth can be attributed to the rising demand for scalable and flexible database management solutions that efficiently handle large volumes of unstructured data.



    One of the primary growth factors driving the Non-Relational SQL market is the exponential increase in data generation from various sources such as social media, IoT devices, and enterprise applications. As businesses seek to leverage this data for gaining insights and making informed decisions, the need for databases that can manage and process unstructured data efficiently has become paramount. Non-Relational SQL databases, such as document stores and graph databases, provide the required flexibility and scalability, making them an ideal choice for modern data-driven enterprises.



    Another significant growth factor is the increasing adoption of cloud-based solutions. Cloud deployment offers numerous advantages, including reduced infrastructure costs, scalability, and easier management. These benefits have led to a surge in the adoption of Non-Relational SQL databases hosted on cloud platforms. Major cloud service providers like Amazon Web Services, Microsoft Azure, and Google Cloud offer robust Non-Relational SQL database services, further fueling market growth. Additionally, the integration of AI and machine learning with Non-Relational SQL databases is expected to enhance their capabilities, driving further adoption.



    The rapid advancement in technology and the growing need for real-time data processing and analytics are also propelling the market's growth. Non-Relational SQL databases are designed to handle high-velocity data and provide quick query responses, making them suitable for real-time applications such as fraud detection, recommendation engines, and personalized marketing. As organizations increasingly rely on real-time data to enhance customer experiences and optimize operations, the demand for Non-Relational SQL databases is set to rise.



    Regional outlook indicates that North America holds the largest share of the Non-Relational SQL market, driven by the presence of major technology companies and early adoption of advanced database technologies. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period, fueled by the rapid digital transformation initiatives and increasing investments in cloud infrastructure. Europe and Latin America also present significant growth opportunities due to the rising adoption of big data and analytics solutions.



    Database Type Analysis



    When analyzing the Non-Relational SQL market by database type, we observe that document stores hold a significant share of the market. Document stores, such as MongoDB and Couchbase, are particularly favored for their ability to store, retrieve, and manage document-oriented information. These databases are highly flexible, allowing for the storage of complex data structures and providing an intuitive query language. The increasing adoption of document stores can be ascribed to their ease of use and adaptability to various application requirements, making them a popular choice among developers and businesses.



    Key-Value stores represent another crucial segment of the Non-Relational SQL market. These databases are known for their simplicity and high performance, making them ideal for caching, session management, and real-time data processing applications. Redis and Amazon DynamoDB are prominent examples of key-value stores that have gained widespread acceptance. The growing need for low-latency data access and the ability to handle massive volumes of data efficiently are key drivers for the adoption of key-value stores in various industries.



    The market for column stores is also expanding as businesses require databases that can handle large-scale analytical queries efficiently. Columnar storage formats, such as Apache Cassandra and HBase, optimize read and write performance for analytical processing, making them suitable for big data analytics and business intelligence applications. The ability to perform complex queries on large datasets quickly is a significant advantage of column stores, driving their adoption in industries that rely heavily on data analytics.



    Graph databases, such as Neo4j and Amazon Neptune, are gaining traction due to their ability to model

  16. n

    Repository Analytics and Metrics Portal (RAMP) 2019 data

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jul 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2019 data [Dataset]. http://doi.org/10.5061/dryad.crjdfn342
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 14, 2021
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Version update: The originally uploaded versions of the CSV files in this dataset included an extra column, "Unnamed: 0," which is not RAMP data and was an artifact of the process used to export the data to CSV format. This column has been removed from the revised dataset. The data are otherwise the same as in the first version.

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2019. For a description of the data collection, processing, and output methods, please see the "methods" section below.

    Methods

    Data Collection

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

    The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

    Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.

    As a result, two CSV datasets are provided for each month of published data:

    page-clicks:

    The data in these CSV files correspond to the page-level data, and include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “page-clicks”. For example, the file named 2019-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2019.

    country-device-info:

    The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    index: The Elasticsearch index corresponding to country and device access information data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “country-device-info”. For example, the file named 2019-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2019.

    References

    Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.

  17. n

    Data from: Repository Analytics and Metrics Portal (RAMP) 2021 data

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated May 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2023). Repository Analytics and Metrics Portal (RAMP) 2021 data [Dataset]. http://doi.org/10.5061/dryad.1rn8pk0tz
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2021. For a description of the data collection, processing, and output methods, please see the "methods" section below.

    The record will be revised periodically to make new data available through the remainder of 2021.

    Methods

    Data Collection

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

    The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

    Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.

    As a result, two CSV datasets are provided for each month of published data:

    page-clicks:

    The data in these CSV files correspond to the page-level data, and include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “page-clicks”. For example, the file named 2021-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2021.

    country-device-info:

    The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:

    country: The country from which the corresponding search originated.
    device: The device used for the search.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    date: The date of the search.
    index: The Elasticsearch index corresponding to country and device access information data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data end with “country-device-info”. For example, the file named 2021-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2021.

    References

    Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.

  18. Data from: Bitcoin Cryptocurrency

    • console.cloud.google.com
    Updated Aug 31, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Bitcoin&inv=1&invt=AbzlbQ (2019). Bitcoin Cryptocurrency [Dataset]. https://console.cloud.google.com/marketplace/product/bitcoin/crypto-bitcoin
    Explore at:
    Dataset updated
    Aug 31, 2019
    Dataset provided by
    Googlehttp://google.com/
    Description

    Bitcoin is a crypto currency leveraging blockchain technology to store transactions in a distributed ledger. A blockchain is an ever-growing tree of blocks. Each block contains a number of transactions. To learn more, read the Bitcoin Wiki . This dataset is part of a larger effort to make cryptocurrency data available in BigQuery through the Google Cloud Public Datasets program. The program is hosting several cryptocurrency datasets, with plans to both expand offerings to include additional cryptocurrencies and reduce the latency of updates. You can find these datasets by searching "cryptocurrency" in GCP Marketplace. For analytics interoperability, we designed a unified schema that allows all Bitcoin-like datasets to share queries. To further interoperate with Ethereum and ERC-20 token transactions, we also created some views that abstract the blockchain ledger to be presented as a double-entry accounting ledger. Interested in learning more about how the data from these blockchains were brought into BigQuery? Looking for more ways to analyze the data? Check out our blog post on the Google Cloud Big Data Blog and try the sample query below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  19. 202312-2024_Cyclistic-tripdata

    • kaggle.com
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GODWIN ABIANGA (2025). 202312-2024_Cyclistic-tripdata [Dataset]. https://www.kaggle.com/datasets/godwinabianga/202312-2024-divvy-tripdata/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    Kaggle
    Authors
    GODWIN ABIANGA
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context A cleaned datat set used in Google Data Analytics CaptStone Project. The Raw Data can be found from the Original source through this link: https://divvy-tripdata.s3.amazonaws.com/index.html The Data source used here is the 202312 and 2024(01-11) divvy-tripdata.zip files respectively.

    The Data has already been cleaned up by me and few more columns have been added for further analysis.

    Content The sample dataset contains data set of an Imaginary company knowns as Cyclistic. A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. It includes the following kinds of information:

    Acknowledgements Time Frame: December 2023 - November 2024

    File source: https://divvy-tripdata.s3.amazonaws.com/index.html

    Special thanks to: Google and Coursera

    Inspiration

    Three questions will guide the future marketing program:

    • How do annual members and casual riders use Cyclistic bikes differently?
    • Why would casual riders buy Cyclistic annual memberships?
    • How can Cyclistic use digital media to influence casual riders to become members?
    • Is annual members much more profitable than casual riders?
  20. n

    Repository Analytics and Metrics Portal (RAMP) 2017 data

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2017 data [Dataset]. http://doi.org/10.5061/dryad.r7sqv9scf
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    University of New Mexico
    Montana State University
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2017. For a description of the data collection, processing, and output methods, please see the "methods" section below.

    Methods RAMP Data Documentation – January 1, 2017 through August 18, 2018

    Data Collection

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

    The data in these CSV files include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data follow the format 2017-01_RAMP_all.csv. Using this example, the file 2017-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2017.

    References

    Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Organization logoOrganization logo

Google Analytics Sample

Google Analytics Sample (BigQuery)

Explore at:
16 scholarly articles cite this dataset (View in Google Scholar)
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?

Search
Clear search
Close search
Google apps
Main menu