10 datasets found
  1. BigQuery Sample Tables

    • kaggle.com
    zip
    Updated Sep 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). BigQuery Sample Tables [Dataset]. https://www.kaggle.com/bigquery/samples
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 4, 2018
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    BigQuery provides a limited number of sample tables that you can run queries against. These tables are suited for testing queries and learning BigQuery.

    Content

    • gsod: Contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010.

    • github_nested: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a nested schema. Created in September 2012.

    • github_timeline: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a flat schema. Created in May 2012.

    • natality: Describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008.

    • shakespeare: Contains a word index of the works of Shakespeare, giving the number of times each word appears in each corpus.

    • trigrams: Contains English language trigrams from a sample of works published between 1520 and 2008.

    • wikipedia: Contains the complete revision history for all Wikipedia articles up to April 2010.

    Fork this kernel to get started.

    Acknowledgements

    Data Source: https://cloud.google.com/bigquery/sample-tables

    Banner Photo by Mervyn Chan from Unplash.

    Inspiration

    How many babies were born in New York City on Christmas Day?

    How many words are in the play Hamlet?

  2. Google Ads Transparency Center

    • console.cloud.google.com
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
    Explore at:
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  3. h

    stackexchange

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Albert Gong, stackexchange [Dataset]. https://huggingface.co/datasets/ag2435/stackexchange
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Albert Gong
    Description

    StackExchange Dataset

    Working doc: https://docs.google.com/document/d/1h585bH5sYcQW4pkHzqWyQqA4ape2Bq6o1Cya0TkMOQc/edit?usp=sharing

    BigQuery query (see so_bigquery.ipynb): CREATE TEMP TABLE answers AS SELECT * FROM bigquery-public-data.stackoverflow.posts_answers WHERE LOWER(Body) LIKE '%arxiv%';

    CREATE TEMPORARY TABLE questions AS SELECT * FROM bigquery-public-data.stackoverflow.posts_questions;

    SELECT * FROM answers JOIN questions ON questions.id = answers.parent_id;

    NOTE:… See the full description on the dataset page: https://huggingface.co/datasets/ag2435/stackexchange.

  4. Ethereum Blockchain

    • kaggle.com
    zip
    Updated Mar 4, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Ethereum Blockchain [Dataset]. https://www.kaggle.com/datasets/bigquery/ethereum-blockchain
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 4, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Bitcoin and other cryptocurrencies have captured the imagination of technologists, financiers, and economists. Digital currencies are only one application of the underlying blockchain technology. Like its predecessor, Bitcoin, the Ethereum blockchain can be described as an immutable distributed ledger. However, creator Vitalik Buterin also extended the set of capabilities by including a virtual machine that can execute arbitrary code stored on the blockchain as smart contracts.

    Both Bitcoin and Ethereum are essentially OLTP databases, and provide little in the way of OLAP (analytics) functionality. However the Ethereum dataset is notably distinct from the Bitcoin dataset:

    • The Ethereum blockchain has as its primary unit of value Ether, while the Bitcoin blockchain has Bitcoin. However, the majority of value transfer on the Ethereum blockchain is composed of so-called tokens. Tokens are created and managed by smart contracts.

    • Ether value transfers are precise and direct, resembling accounting ledger debits and credits. This is in contrast to the Bitcoin value transfer mechanism, for which it can be difficult to determine the balance of a given wallet address.

    • Addresses can be not only wallets that hold balances, but can also contain smart contract bytecode that allows the programmatic creation of agreements and automatic triggering of their execution. An aggregate of coordinated smart contracts could be used to build a decentralized autonomous organization.

    Content

    The Ethereum blockchain data are now available for exploration with BigQuery. All historical data are in the ethereum_blockchain dataset, which updates daily.

    Our hope is that by making the data on public blockchain systems more readily available it promotes technological innovation and increases societal benefits.

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_ethereum.[TABLENAME]. Fork this kernel to get started.

    Acknowledgements

    Cover photo by Thought Catalog on Unsplash

    Inspiration

    • What are the most popularly exchanged digital tokens, represented by ERC-721 and ERC-20 smart contracts?
    • Compare transaction volume and transaction networks over time
    • Compare transaction volume to historical prices by joining with other available data sources like Bitcoin Historical Data
  5. political_ads_20220630

    • kaggle.com
    zip
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joe Hardin (2022). political_ads_20220630 [Dataset]. https://www.kaggle.com/datasets/joehardin369/political-ads-20220630
    Explore at:
    zip(342330 bytes)Available download formats
    Dataset updated
    Jun 30, 2022
    Authors
    Joe Hardin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    My case study for the Google analytics certificate. The data came from publicly available FCC and google data found on BigQuery. I have put each query needed to generate these tables in the about this file of each table. I am looking for interesting trends between the behavior of political ad spend with traditional media (radio and TV) vs internet (google Adspend only).

  6. Medicare and Medicaid Services

    • kaggle.com
    zip
    Updated Apr 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). Medicare and Medicaid Services [Dataset]. https://www.kaggle.com/datasets/bigquery/sdoh-hrsa-shortage-areas
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 22, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    Description

    Context

    This public dataset was created by the Centers for Medicare & Medicaid Services. The data summarize counts of enrollees who are dually-eligible for both Medicare and Medicaid program, including those in Medicare Savings Programs. “Duals” represent 20 percent of all Medicare beneficiaries, yet they account for 34 percent of all spending by the program, according to the Commonwealth Fund . As a representation of this high-needs, high-cost population, these data offer a view of regions ripe for more intensive care coordination that can address complex social and clinical needs. In addition to the high cost savings opportunity to deliver upstream clinical interventions, this population represents the county-by-county volume of patients who are eligible for both state level (Medicaid) and federal level (Medicare) reimbursements and potential funding streams to address unmet social needs across various programs, waivers, and other projects. The dataset includes eligibility type and enrollment by quarter, at both the state and county level. These data represent monthly snapshots submitted by states to the CMS, which are inherently lower than ever-enrolled counts (which include persons enrolled at any time during a calendar year.) For more information on dually eligible beneficiaries

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.sdoh_cms_dual_eligible_enrollment.

    Sample Query

    In what counties in Michigan has the number of dual-eligible individuals increased the most from 2015 to 2018? Find the counties in Michigan which have experienced the largest increase of dual enrollment households

    duals_Jan_2015 AS ( SELECT Public_Total AS duals_2015, County_Name, FIPS FROM bigquery-public-data.sdoh_cms_dual_eligible_enrollment.dual_eligible_enrollment_by_county_and_program WHERE State_Abbr = "MI" AND Date = '2015-12-01' ),

    duals_increase AS ( SELECT d18.FIPS, d18.County_Name, d15.duals_2015, d18.duals_2018, (d18.duals_2018 - d15.duals_2015) AS total_duals_diff FROM duals_Jan_2018 d18 JOIN duals_Jan_2015 d15 ON d18.FIPS = d15.FIPS )

    SELECT * FROM duals_increase WHERE total_duals_diff IS NOT NULL ORDER BY total_duals_diff DESC

  7. OpenStreetMap Public Dataset

    • console.cloud.google.com
    Updated Apr 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:OpenStreetMap&hl=de (2023). OpenStreetMap Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/openstreetmap/geo-openstreetmap?hl=de
    Explore at:
    Dataset updated
    Apr 23, 2023
    Dataset provided by
    OpenStreetMap//www.openstreetmap.org/
    Googlehttp://google.com/
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Adapted from Wikipedia: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Created in 2004, it was inspired by the success of Wikipedia and more than two million registered users who can add data by manual survey, GPS devices, aerial photography, and other free sources. We've made available a number of tables (explained in detail below): history_* tables: full history of OSM objects planet_* tables: snapshot of current OSM objects as of Nov 2019 The history_* and planet_* table groups are composed of node, way, relation, and changeset tables. These contain the primary OSM data types and an additional changeset corresponding to OSM edits for convenient access. These objects are encoded using the BigQuery GEOGRAPHY data type so that they can be operated upon with the built-in geography functions to perform geometry and feature selection, additional processing. Example analyses are given below. This dataset is part of a larger effort to make data available in BigQuery through the Google Cloud Public Datasets program . OSM itself is produced as a public good by volunteers, and there are no guarantees about data quality. Interested in learning more about how these data were brought into BigQuery and how you can use them? Check out the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  8. COVID-19 Search Trends symptoms dataset

    • console.cloud.google.com
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=fr (2023). COVID-19 Search Trends symptoms dataset [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-search-trends?hl=fr
    Explore at:
    Dataset updated
    Jun 22, 2023
    Dataset provided by
    Google Searchhttp://google.com/
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for a broad set of health symptoms, signs, and conditions. The dataset provides a daily or weekly time series for each region showing the relative volume of searches for each symptom. This dataset is intended to help researchers to better understand the impact of COVID-19. It shouldn't be used for medical diagnostic, prognostic, or treatment purposes. It also isn't intended to be used for guidance on personal travel plans. To learn more about the dataset, how we generate it and preserve privacy, read the data documentation . To visualize the data, try exploring these interactive charts and map of symptom search trends . As of Dec. 15, 2020, the dataset was expanded to include trends for Australia, Ireland, New Zealand, Singapore, and the United Kingdom. This expanded data is available in new tables that provide data at country and two subregional levels. We will not be updating existing state/county tables going forward. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  9. AlphaFold Protein Structure Database

    • console.cloud.google.com
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=en-GB (2023). AlphaFold Protein Structure Database [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/deepmind-alphafold?hl=en-GB
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    License
    Description

    The AlphaFold Protein Structure Database is a collection of protein structure predictions made using the machine learning model AlphaFold. AlphaFold was developed by DeepMind , and this database was created in partnership with EMBL-EBI . For information on how to interpret, download and query the data, as well as on which proteins are included / excluded, and change log, please see our main dataset guide and FAQs . To interactively view individual entries or to download proteomes / Swiss-Prot please visit https://alphafold.ebi.ac.uk/ . The current release aims to cover most of the over 200M sequences in UniProt (a commonly used reference set of annotated proteins). The files provided for each entry include the structure plus two model confidence metrics (pLDDT and PAE). The files can be found in the Google Cloud Storage bucket gs://public-datasets-deepmind-alphafold-v4 with metadata in the BigQuery table bigquery-public-data.deepmind_alphafold.metadata . If you use this data, please cite: Jumper, J et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021) Varadi, M et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research (2021) This public dataset is hosted in Google Cloud Storage and is available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.

  10. The GDELT Project

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The GDELT Project (2019). The GDELT Project [Dataset]. https://www.kaggle.com/datasets/gdelt/gdelt
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset authored and provided by
    The GDELT Project
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?

    Content

    GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.

    You may find these code books helpful:
    GDELT Global Knowledge Graph Codebook V2.1 (PDF)
    GDELT Event Codebook V2.0 (PDF)

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.

    Acknowledgements

    You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google BigQuery (2018). BigQuery Sample Tables [Dataset]. https://www.kaggle.com/bigquery/samples
Organization logoOrganization logo

BigQuery Sample Tables

Sample Tables for Tutorials and Learning (BigQuery)

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 4, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

BigQuery provides a limited number of sample tables that you can run queries against. These tables are suited for testing queries and learning BigQuery.

Content

  • gsod: Contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010.

  • github_nested: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a nested schema. Created in September 2012.

  • github_timeline: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a flat schema. Created in May 2012.

  • natality: Describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008.

  • shakespeare: Contains a word index of the works of Shakespeare, giving the number of times each word appears in each corpus.

  • trigrams: Contains English language trigrams from a sample of works published between 1520 and 2008.

  • wikipedia: Contains the complete revision history for all Wikipedia articles up to April 2010.

Fork this kernel to get started.

Acknowledgements

Data Source: https://cloud.google.com/bigquery/sample-tables

Banner Photo by Mervyn Chan from Unplash.

Inspiration

How many babies were born in New York City on Christmas Day?

How many words are in the play Hamlet?

Search
Clear search
Close search
Google apps
Main menu