44 datasets found
  1. BigQuery Sample Tables

    • kaggle.com
    zip
    Updated Sep 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). BigQuery Sample Tables [Dataset]. https://www.kaggle.com/bigquery/samples
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 4, 2018
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    BigQuery provides a limited number of sample tables that you can run queries against. These tables are suited for testing queries and learning BigQuery.

    Content

    • gsod: Contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010.

    • github_nested: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a nested schema. Created in September 2012.

    • github_timeline: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a flat schema. Created in May 2012.

    • natality: Describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008.

    • shakespeare: Contains a word index of the works of Shakespeare, giving the number of times each word appears in each corpus.

    • trigrams: Contains English language trigrams from a sample of works published between 1520 and 2008.

    • wikipedia: Contains the complete revision history for all Wikipedia articles up to April 2010.

    Fork this kernel to get started.

    Acknowledgements

    Data Source: https://cloud.google.com/bigquery/sample-tables

    Banner Photo by Mervyn Chan from Unplash.

    Inspiration

    How many babies were born in New York City on Christmas Day?

    How many words are in the play Hamlet?

  2. Looker Ecommerce BigQuery Dataset

    • kaggle.com
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mustafa Keser
    Description

    Looker Ecommerce Dataset Description

    CSV version of Looker Ecommerce Dataset.

    Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

    1. distribution_centers.csv

    • Columns:
      • id: Unique identifier for each distribution center.
      • name: Name of the distribution center.
      • latitude: Latitude coordinate of the distribution center.
      • longitude: Longitude coordinate of the distribution center.

    2. events.csv

    • Columns:
      • id: Unique identifier for each event.
      • user_id: Identifier for the user associated with the event.
      • sequence_number: Sequence number of the event.
      • session_id: Identifier for the session during which the event occurred.
      • created_at: Timestamp indicating when the event took place.
      • ip_address: IP address from which the event originated.
      • city: City where the event occurred.
      • state: State where the event occurred.
      • postal_code: Postal code of the event location.
      • browser: Web browser used during the event.
      • traffic_source: Source of the traffic leading to the event.
      • uri: Uniform Resource Identifier associated with the event.
      • event_type: Type of event recorded.

    3. inventory_items.csv

    • Columns:
      • id: Unique identifier for each inventory item.
      • product_id: Identifier for the associated product.
      • created_at: Timestamp indicating when the inventory item was created.
      • sold_at: Timestamp indicating when the item was sold.
      • cost: Cost of the inventory item.
      • product_category: Category of the associated product.
      • product_name: Name of the associated product.
      • product_brand: Brand of the associated product.
      • product_retail_price: Retail price of the associated product.
      • product_department: Department to which the product belongs.
      • product_sku: Stock Keeping Unit (SKU) of the product.
      • product_distribution_center_id: Identifier for the distribution center associated with the product.

    4. order_items.csv

    • Columns:
      • id: Unique identifier for each order item.
      • order_id: Identifier for the associated order.
      • user_id: Identifier for the user who placed the order.
      • product_id: Identifier for the associated product.
      • inventory_item_id: Identifier for the associated inventory item.
      • status: Status of the order item.
      • created_at: Timestamp indicating when the order item was created.
      • shipped_at: Timestamp indicating when the order item was shipped.
      • delivered_at: Timestamp indicating when the order item was delivered.
      • returned_at: Timestamp indicating when the order item was returned.

    5. orders.csv

    • Columns:
      • order_id: Unique identifier for each order.
      • user_id: Identifier for the user who placed the order.
      • status: Status of the order.
      • gender: Gender information of the user.
      • created_at: Timestamp indicating when the order was created.
      • returned_at: Timestamp indicating when the order was returned.
      • shipped_at: Timestamp indicating when the order was shipped.
      • delivered_at: Timestamp indicating when the order was delivered.
      • num_of_item: Number of items in the order.

    6. products.csv

    • Columns:
      • id: Unique identifier for each product.
      • cost: Cost of the product.
      • category: Category to which the product belongs.
      • name: Name of the product.
      • brand: Brand of the product.
      • retail_price: Retail price of the product.
      • department: Department to which the product belongs.
      • sku: Stock Keeping Unit (SKU) of the product.
      • distribution_center_id: Identifier for the distribution center associated with the product.

    7. users.csv

    • Columns:
      • id: Unique identifier for each user.
      • first_name: First name of the user.
      • last_name: Last name of the user.
      • email: Email address of the user.
      • age: Age of the user.
      • gender: Gender of the user.
      • state: State where t...
  3. Google Ads Transparency Center

    • console.cloud.google.com
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
    Explore at:
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  4. SAP DATASET | BigQuery Dataset

    • kaggle.com
    zip
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). SAP DATASET | BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/sap-dataset-bigquery-dataset/discussion
    Explore at:
    zip(365940125 bytes)Available download formats
    Dataset updated
    Aug 20, 2024
    Authors
    Mustafa Keser
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Certainly! Here's a description for the Kaggle dataset related to the cloud-training-demos.SAP_REPLICATED_DATA BigQuery public dataset:

    Dataset Description: SAP Replicated Data

    Dataset ID: cloud-training-demos.SAP_REPLICATED_DATA

    Overview: The SAP_REPLICATED_DATA dataset in BigQuery provides a comprehensive replication of SAP (Systems, Applications, and Products in Data Processing) business data. This dataset is designed to support data analytics and machine learning tasks by offering a rich set of structured data that mimics real-world enterprise scenarios. It includes data from various SAP modules and processes, enabling users to perform in-depth analysis, build predictive models, and explore business insights.

    Content: - Tables and Schemas: The dataset consists of multiple tables representing different aspects of SAP business operations, including but not limited to sales, inventory, finance, and procurement data. - Data Types: It contains structured data with fields such as transaction IDs, timestamps, customer details, product information, sales figures, and financial metrics. - Data Volume: The dataset is designed to simulate large-scale enterprise data, making it suitable for performance testing, data processing, and analysis.

    Usage: - Business Analytics: Users can analyze business trends, sales performance, and financial metrics. - Machine Learning: Ideal for developing and testing machine learning models related to business forecasting, anomaly detection, and customer segmentation. - Data Processing: Suitable for practicing SQL queries, data transformation, and integration tasks.

    Example Use Cases: - Sales Analysis: Track and analyze sales performance across different regions and time periods. - Inventory Management: Monitor inventory levels and identify trends in stock movements. - Financial Reporting: Generate financial reports and analyze expense patterns.

    For more information and to access the dataset, visit the BigQuery public datasets page or refer to the dataset documentation in the BigQuery console.

    Tables:

    Here's a Markdown table with the information you provided:

    File NameDescription
    adr6.csvAddresses with organizational units. Contains address details related to organizational units like departments or branches.
    adrc.csvGeneral Address Data. Provides information about addresses, including details such as street, city, and postal codes.
    adrct.csvAddress Contact Information. Contains contact information linked to addresses, including phone numbers and email addresses.
    adrt.csvAddress Details. Includes detailed address data such as street addresses, city, and country codes.
    ankt.csvAccounting Document Segment. Provides details on segments within accounting documents, including account numbers and amounts.
    anla.csvAsset Master Data. Contains information about fixed assets, including asset identification and classification.
    bkpf.csvAccounting Document Header. Contains headers of accounting documents, such as document numbers and fiscal year.
    bseg.csvAccounting Document Segment. Details line items within accounting documents, including account details and amounts.
    but000.csvBusiness Partners. Contains basic information about business partners, including IDs and names.
    but020.csvBusiness Partner Addresses. Provides address details associated with business partners.
    cepc.csvCustomer Master Data - Central. Contains centralized data for customer master records.
    cepct.csvCustomer Master Data - Contact. Provides contact details associated with customer records.
    csks.csvCost Center Master Data. Contains data about cost centers within the organization.
    cskt.csvCost Center Texts. Provides text descriptions and labels for cost centers.
    dd03l.csvData Element Field Labels. Contains labels and descriptions for data fields in the SAP system.
    ekbe.csvPurchase Order History. Details history of purchase orders, including quantities and values.
    ekes.csvPurchasing Document History. Contains history of purchasing documents including changes and statuses.
    eket.csvPurchase Order Item History. Details changes and statuses for individual purchase order items.
    ekkn.csvPurchase Order Account Assignment. Provides account assignment details for purchas...
  5. Tezos Cryptocurrency

    • console.cloud.google.com
    Updated Aug 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Cloud%20Public%20Datasets%20-%20Finance&hl=en-GB (2023). Tezos Cryptocurrency [Dataset]. https://console.cloud.google.com/marketplace/product/public-data-finance/crypto-tezos-dataset?hl=en-GB
    Explore at:
    Dataset updated
    Aug 10, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    Tezos is a technology for deploying a blockchain capable of modifying its own set of rules with minimal disruption to the network through an on-chain governance model. Learn more... This dataset is one of many crypto datasets that are available within the Google Cloud Public Datasets . As with other Google Cloud public datasets, you can query this dataset for free, up to 1TB/month of free processing, every month. Watch this short video to learn how to get started with the public datasets. Want to know how the data from these blockchains were brought into BigQuery, and learn how to analyze the data? Learn more

  6. d

    Post-Processing National Water Model Long-Range Forecasts with Random Forest...

    • search.dataone.org
    • hydroshare.org
    Updated Dec 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Anderson (2024). Post-Processing National Water Model Long-Range Forecasts with Random Forest Regression in the Cloud to Improve Forecast Accuracy for Decision-Makers and Water Managers - Script/Data [Dataset]. https://search.dataone.org/view/sha256%3A50abc8f187746159df8ac98d1a6eda224082e6ee902ab18f6d55f7d151291447
    Explore at:
    Dataset updated
    Dec 14, 2024
    Dataset provided by
    Hydroshare
    Authors
    Jacob Anderson
    Description

    This resource contains the Python script run within the Google Cloud Console to bias correct the NWM long-range forecasts.

  7. Google Analytics Sample

    • console.cloud.google.com
    Updated Jul 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=en_GB (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=en_GB
    Explore at:
    Dataset updated
    Jul 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  8. Stack Overflow Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stack Overflow (2019). Stack Overflow Data [Dataset]. https://www.kaggle.com/datasets/stackoverflow/stackoverflow
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    Stack Overflowhttp://stackoverflow.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Context

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.

    Content

    Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    Dataset Source: https://archive.org/download/stackexchange

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow

    https://cloud.google.com/bigquery/public-data/stackoverflow

    Banner Photo by Caspar Rubin from Unplash.

    Inspiration

    What is the percentage of questions that have been answered over the years?

    What is the reputation and badge count of users across different tenures on StackOverflow?

    What are 10 of the “easier” gold badges to earn?

    Which day of the week has most questions answered within an hour?

  9. BigQuery GIS Utility Datasets (U.S.)

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). BigQuery GIS Utility Datasets (U.S.) [Dataset]. https://www.kaggle.com/bigquery/utility-us
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Querying BigQuery tables You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME].

    • Project: "bigquery-public-data"
    • Table: "utility_us"

    Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.

    If you're using Python, you can start with this code:

    import pandas as pd
    from bq_helper import BigQueryHelper
    bq_assistant = BigQueryHelper("bigquery-public-data", "utility_us")
    
  10. SEC Public Dataset

    • console.cloud.google.com
    Updated Jul 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=en_GB (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=en_GB
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Learn more

  11. Data from: Bitcoin Cryptocurrency

    • console.cloud.google.com
    Updated Mar 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Bitcoin&hl=fr_FR (2023). Bitcoin Cryptocurrency [Dataset]. https://console.cloud.google.com/marketplace/product/bitcoin/crypto-bitcoin?hl=fr_FR
    Explore at:
    Dataset updated
    Mar 26, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    Bitcoin is a crypto currency leveraging blockchain technology to store transactions in a distributed ledger. A blockchain is an ever-growing tree of blocks. Each block contains a number of transactions. To learn more, read the Bitcoin Wiki . This dataset is part of a larger effort to make cryptocurrency data available in BigQuery through the Google Cloud Public Datasets program. The program is hosting several cryptocurrency datasets, with plans to both expand offerings to include additional cryptocurrencies and reduce the latency of updates. You can find these datasets by searching "cryptocurrency" in GCP Marketplace. For analytics interoperability, we designed a unified schema that allows all Bitcoin-like datasets to share queries. To further interoperate with Ethereum and ERC-20 token transactions, we also created some views that abstract the blockchain ledger to be presented as a double-entry accounting ledger. Interested in learning more about how the data from these blockchains were brought into BigQuery? Looking for more ways to analyze the data? Check out our blog post on the Google Cloud Big Data Blog and try the sample query below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  12. Borg Traces dataset

    • figshare.com
    application/csv
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saroj Mali (2024). Borg Traces dataset [Dataset]. http://doi.org/10.6084/m9.figshare.26308690.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Saroj Mali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ClusterData 2019 tracesJohn Wilkes.The clusterdata-2019 trace dataset provides information about eight different Borg cells for the month of May 2019. It includes the following new information:CPU usage information histograms for each 5 minute period, not just a point sample;information about alloc sets (shared resource reservations used by jobs);job-parent information for master/worker relationships such as MapReduce jobs.The 2019 traces focus on resource requests and usage, and contain no information about end users, their data, or access patterns to storage systems and other services.Because of it's size (about 2.4TiB compressed), we are only making the trace data available via Google BigQuery so that sophisticated analyses can be performed without requiring local resources.The clusterdata-2019 traces are described in this document: Google cluster-usage traces v3. You can find the download and access instructions there, as well as many more details about what is in the traces, and how to interpret them. For additional background information, please refer to the 2015 Borg paper, Large-scale cluster management at Google with Borg.

  13. covid19-public-forecasts

    • kaggle.com
    zip
    Updated Aug 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). covid19-public-forecasts [Dataset]. https://www.kaggle.com/bigquery/covid19-public-forecasts
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Aug 13, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    Description

    Context

    In partnership with the Harvard Global Health Institute, Google Cloud is releasing the COVID-19 Public Forecasts to serve as an additional resource for first responders in healthcare, the public sector, and other impacted organizations preparing for what lies ahead. These forecasts are available for free and provide a projection of COVID-19 cases, deaths, and other metrics over the next 14 days for US counties and states. For more info, see https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-is-releasing-the-covid-19-public-forecasts and https://storage.googleapis.com/covid-external/COVID-19ForecastWhitePaper.pdf

    Content

    A projection of COVID-19 cases, deaths, and other metrics over the next 14 days for US counties and states

    Acknowledgements

    Released on BigQuery by Google Cloud:

    https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-is-releasing-the-covid-19-public-forecasts

    https://pantheon.corp.google.com/marketplace/product/bigquery-public-datasets/covid19-public-forecasts

  14. The Met Public Domain Art Works

    • console.cloud.google.com
    Updated Nov 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:The%20Met&hl=de (2023). The Met Public Domain Art Works [Dataset]. https://console.cloud.google.com/marketplace/product/the-metropolitan-museum-of-art/the-met-public-domain-art-works?hl=de&jsmode
    Explore at:
    Dataset updated
    Nov 5, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    The Metropolitan Museum of Art, better known as the Met, provides a public domain dataset with over 200,000 objects including metadata and images. In early 2017, the Met debuted their Open Access policy to make part of their collection freely available for unrestricted use under the Creative Commons Zero designation and their own terms and conditions. This dataset provides a new view to one of the world’s premier collections of fine art. The data includes both image in Google Cloud Storage, and associated structured data in two BigQuery two tables, objects and images (1:N). Locations to images on both The Met’s website and in Google Cloud Storage are available in the BigQuery table. The meta data for this public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . The image data for this public dataset is hosted in Google Cloud Storage and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.

  15. Google Community Mobility Reports

    • console.cloud.google.com
    Updated Jun 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ja (2022). Google Community Mobility Reports [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19_google_mobility?hl=ja
    Explore at:
    Dataset updated
    Jun 18, 2022
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Description

    UPDATE: The Community Mobility Reports are no longer being updated as of October 15, 2022. All historical data will remain publicly available for research purposes. This dataset aims to provide insights into what has changed in response to policies aimed at combating COVID-19. It reports movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. This dataset is intended to help remediate the impact of COVID-19. It shouldn’t be used for medical diagnostic, prognostic, or treatment purposes. It also isn’t intended to be used for guidance on personal travel plans. To learn more about the dataset, the place categories and how we calculate these trends and preserve privacy, visit our help center or read the data documentation All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  16. Chicago Crime (2015 - 2020)

    • kaggle.com
    zip
    Updated Dec 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronnie (2021). Chicago Crime (2015 - 2020) [Dataset]. https://www.kaggle.com/datasets/redlineracer/chicago-crime-2015-2020
    Explore at:
    zip(1275046 bytes)Available download formats
    Dataset updated
    Dec 19, 2021
    Authors
    Ronnie
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Chicago
    Description

    Context

    This dataset contains information on Chicago crime reported between 2015 and 2020.

    Content

    This dataset is a subset of the BigQuery public database on Chicago Crime.

    Acknowledgements

    I appreciate the efforts of BigQuery hosting and allowing access to their public databases and Kaggle for providing a space for the widespread sharing of data and knowledge.

    Inspiration

    This dataset is a useful learning tool for applying descriptive statistics, analytics, and visualisations. For example, one could look at crime trends over time, identify areas with the lowest amount of crime, calculate the propability that an arrest is made based on crime type or area, and determine days of the week with the highest and lowest crime.

  17. cms-medicare

    • kaggle.com
    zip
    Updated Apr 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). cms-medicare [Dataset]. https://www.kaggle.com/datasets/bigquery/cms-medicare
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 21, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    Description

    Context

    This dataset contains Hospital General Information from the U.S. Department of Health & Human Services. This is the BigQuery COVID-19 public dataset. This data contains a list of all hospitals that have been registered with Medicare. This list includes addresses, phone numbers, hospital types and quality of care information. The quality of care data is provided for over 4,000 Medicare-certified hospitals, including over 130 Veterans Administration (VA) medical centers, across the country. You can use this data to find hospitals and compare the quality of their care

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.cms_medicare.hospital_general_info.

    Sample Query

    How do the hospitals in Mountain View, CA compare to the average hospital in the US? With the hospital compare data you can quickly understand how hospitals in one geographic location compare to another location. In this example query we compare Google’s home in Mountain View, California, to the average hospital in the United States. You can also modify the query to learn how the hospitals in your city compare to the US national average.

    “#standardSQL SELECT MTV_AVG_HOSPITAL_RATING, US_AVG_HOSPITAL_RATING FROM ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS MTV_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE city = 'MOUNTAIN VIEW' AND state = 'CA' AND hospital_overall_rating <> 'Not Available') MTV JOIN ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS US_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE hospital_overall_rating <> 'Not Available') ON 1 = 1”

    What are the most common diseases treated at hospitals that do well in the category of patient readmissions? For hospitals that achieved “Above the national average” in the category of patient readmissions, it might be interesting to review the types of diagnoses that are treated at those inpatient facilities. While this query won’t provide the granular detail that went into the readmission calculation, it gives us a quick glimpse into the top disease related groups (DRG)
    , or classification of inpatient stays that are found at those hospitals. By joining the general hospital information to the inpatient charge data, also provided by CMS, you could quickly identify DRGs that may warrant additional research. You can also modify the query to review the top diagnosis related groups for hospital metrics you might be interested in. “#standardSQL SELECT drg_definition, SUM(total_discharges) total_discharge_per_drg FROM bigquery-public-data.cms_medicare.hospital_general_info gi INNER JOIN bigquery-public-data.cms_medicare.inpatient_charges_2015 ic ON gi.provider_id = ic.provider_id WHERE readmission_national_comparison = 'Above the national average' GROUP BY drg_definition ORDER BY total_discharge_per_drg DESC LIMIT 10;”

  18. Taxi Trip Fare Prediction

    • kaggle.com
    zip
    Updated Dec 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nagendra Kumar Reddy Syamala (2023). Taxi Trip Fare Prediction [Dataset]. https://www.kaggle.com/datasets/nani123456789/taxi-trip-fare-prediction
    Explore at:
    zip(3024126 bytes)Available download formats
    Dataset updated
    Dec 15, 2023
    Authors
    Nagendra Kumar Reddy Syamala
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    BigQuery is Google's fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes and terabytes of data without having any infrastructure to manage, or needing a database administrator.

    BigQuery Machine Learning BQML is where data analysts can create, train, evaluate, and predict with machine learning models with minimal coding.

    In this you will explore millions of New York City yellow taxi cab trips available in a BigQuery Public Dataset. You will create a machine learning model inside of BigQuery to predict the fare of the cab ride given your model inputs and evaluate the performance of your model and make predictions with it.

    perform the following tasks:

    Query and explore the public taxi cab dataset. Create a training and evaluation dataset to be used for batch prediction. Create a forecasting (linear regression) model in BQML. Evaluate the performance of your machine learning model.

    There are several model types to choose from:

    Forecasting numeric values like next month's sales with Linear Regression (linear_reg). Binary or Multiclass Classification like spam or not spam email by using Logistic Regression (logistic_reg). k-Means Clustering for when you want unsupervised learning for exploration (kmeans).

    Note: There are many additional model types used in Machine Learning (like Neural Networks and decision trees) and available using libraries like TensorFlow. At this time, BQML supports the three listed above. Follow the BQML roadmap for more information.

    For reference sake of you we also released notebook which is available in this try to explore from that .use AutoMl foundational Models to automatically selecting important features from dataset and Model selection .

    you can also go with spectral clustering algorithms upcourse it is not an unsupervised task but it is correlated ,visualize the Fare trip prices .so that cab drive easily identifies fare trips in their respective locations .

    Build a Forecasting model which helps for cab drives like (uber,rapido) which reach their customers easily and short time

    Dataset : ⏱️ 'trip_duration': How long did the journey last?[in Seconds] 🛣️ 'distance_traveled': How far did the taxi travel?[in Km] 🧑‍🤝‍🧑 'num_of_passengers': How many passengers were in the taxi? 💵 'fare': What's the base fare for the journey?[In INR] 💲 'tip': How much did the driver receive in tips?[In INR] 🎀 'miscellaneous_fees': Were there any additional charges during the trip?e.g. tolls, convenience fees, GST etc.[In INR] 💰 'total_fare': The grand total for the ride (this is your prediction target!).[In INR] ⚡ 'surge_applied': Was there a surge pricing applied? Yes or no?

    IF IT IS USEFUL UPVOTE THE DATASET. THANK YOU!

  19. Data from: Stack Overflow

    • console.cloud.google.com
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&hl=id (2024). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/details/stack-exchange/stack-overflow?hl=id
    Explore at:
    Dataset updated
    Aug 13, 2024
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  20. Secure 6G Education Big Data

    • kaggle.com
    zip
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). Secure 6G Education Big Data [Dataset]. https://www.kaggle.com/datasets/ziya07/secure-6g-education-big-data
    Explore at:
    zip(53461 bytes)Available download formats
    Dataset updated
    Feb 14, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📌 Description The Secure 6G Education Big Data for MIoT-Based Online Learning dataset simulates secure data transmission in a 6G-enabled Mobile Internet of Things (MIoT) environment for online education.

    This dataset is valuable for cybersecurity research, AI-driven educational analytics, quantum cryptography studies, and secure data transmission testing in next-generation learning environments.

    🛠️ Key Features ✔ 1,000 Encrypted Student Records – Simulated data for research and analysis ✔ QKD-Enhanced Encryption – Ensuring quantum-secure data protection ✔ Multi-Source Data Collection – Includes exam scores, biometrics, chat logs, and learning sessions ✔ Transmission Over 6G Networks – High-speed, low-latency educational data flow ✔ Real-World Use Case Simulation – Suitable for testing AI models, encryption techniques, and big data security ✔ Categorized Security Levels – Sensitive student data classified based on encryption needs

    🚀 Potential Use Cases 🔹 Cybersecurity & Encryption Research – Analyze QKD-based secure data transmission 🔹 AI in Education – Study student performance, engagement, and learning patterns 🔹 Quantum Cryptography Studies – Evaluate the effectiveness of quantum-secured networks 🔹 Big Data Analytics – Investigate scalable storage and high-speed data processing

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google BigQuery (2018). BigQuery Sample Tables [Dataset]. https://www.kaggle.com/bigquery/samples
Organization logoOrganization logo

BigQuery Sample Tables

Sample Tables for Tutorials and Learning (BigQuery)

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 4, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

BigQuery provides a limited number of sample tables that you can run queries against. These tables are suited for testing queries and learning BigQuery.

Content

  • gsod: Contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010.

  • github_nested: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a nested schema. Created in September 2012.

  • github_timeline: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a flat schema. Created in May 2012.

  • natality: Describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008.

  • shakespeare: Contains a word index of the works of Shakespeare, giving the number of times each word appears in each corpus.

  • trigrams: Contains English language trigrams from a sample of works published between 1520 and 2008.

  • wikipedia: Contains the complete revision history for all Wikipedia articles up to April 2010.

Fork this kernel to get started.

Acknowledgements

Data Source: https://cloud.google.com/bigquery/sample-tables

Banner Photo by Mervyn Chan from Unplash.

Inspiration

How many babies were born in New York City on Christmas Day?

How many words are in the play Hamlet?

Search
Clear search
Close search
Google apps
Main menu