42 datasets found

Google Ads Transparency Center
console.cloud.google.com
Updated Sep 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
Explore at:
Dataset updated
Sep 6, 2023
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description
This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Looker Ecommerce BigQuery Dataset
kaggle.com
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mustafa Keser
Description
Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. distribution_centers.csv

Columns:

id: Unique identifier for each distribution center.

name: Name of the distribution center.

latitude: Latitude coordinate of the distribution center.

longitude: Longitude coordinate of the distribution center.

2. events.csv

Columns:

id: Unique identifier for each event.

user_id: Identifier for the user associated with the event.

sequence_number: Sequence number of the event.

session_id: Identifier for the session during which the event occurred.

created_at: Timestamp indicating when the event took place.

ip_address: IP address from which the event originated.

city: City where the event occurred.

state: State where the event occurred.

postal_code: Postal code of the event location.

browser: Web browser used during the event.

traffic_source: Source of the traffic leading to the event.

uri: Uniform Resource Identifier associated with the event.

event_type: Type of event recorded.

3. inventory_items.csv

Columns:

id: Unique identifier for each inventory item.

product_id: Identifier for the associated product.

created_at: Timestamp indicating when the inventory item was created.

sold_at: Timestamp indicating when the item was sold.

cost: Cost of the inventory item.

product_category: Category of the associated product.

product_name: Name of the associated product.

product_brand: Brand of the associated product.

product_retail_price: Retail price of the associated product.

product_department: Department to which the product belongs.

product_sku: Stock Keeping Unit (SKU) of the product.

product_distribution_center_id: Identifier for the distribution center associated with the product.

4. order_items.csv

Columns:

id: Unique identifier for each order item.

order_id: Identifier for the associated order.

user_id: Identifier for the user who placed the order.

product_id: Identifier for the associated product.

inventory_item_id: Identifier for the associated inventory item.

status: Status of the order item.

created_at: Timestamp indicating when the order item was created.

shipped_at: Timestamp indicating when the order item was shipped.

delivered_at: Timestamp indicating when the order item was delivered.

returned_at: Timestamp indicating when the order item was returned.

5. orders.csv

Columns:

order_id: Unique identifier for each order.

user_id: Identifier for the user who placed the order.

status: Status of the order.

gender: Gender information of the user.

created_at: Timestamp indicating when the order was created.

returned_at: Timestamp indicating when the order was returned.

shipped_at: Timestamp indicating when the order was shipped.

delivered_at: Timestamp indicating when the order was delivered.

num_of_item: Number of items in the order.

6. products.csv

Columns:

id: Unique identifier for each product.

cost: Cost of the product.

category: Category to which the product belongs.

name: Name of the product.

brand: Brand of the product.

retail_price: Retail price of the product.

department: Department to which the product belongs.

sku: Stock Keeping Unit (SKU) of the product.

distribution_center_id: Identifier for the distribution center associated with the product.

7. users.csv

Columns:

id: Unique identifier for each user.

first_name: First name of the user.

last_name: Last name of the user.

email: Email address of the user.

age: Age of the user.

gender: Gender of the user.

state: State where t...
Google Analytics Sample
console.cloud.google.com
Updated Jul 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=en_GB (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=en_GB
Explore at:
Dataset updated
Jul 15, 2017
Dataset provided by
Googlehttp://google.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Google Patents Public Data
kaggle.com
zip
Updated Sep 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/datasets/bigquery/patents
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

Content

The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

Acknowledgements

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

Banner photo by Helloquence on Unsplash
census-bureau-international
kaggle.com
zip
Updated May 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). census-bureau-international [Dataset]. https://www.kaggle.com/bigquery/census-bureau-international
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 6, 2020
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.

Sample Query 1

What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!

standardSQL

SELECT age.country_name, age.life_expectancy, size.country_area FROM ( SELECT country_name, life_expectancy FROM bigquery-public-data.census_bureau_international.mortality_life_expectancy WHERE year = 2016) age INNER JOIN ( SELECT country_name, country_area FROM bigquery-public-data.census_bureau_international.country_names_area where country_area > 25000) size ON age.country_name = size.country_name ORDER BY 2 DESC /* Limit removed for Data Studio Visualization */ LIMIT 10

Sample Query 2

Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.

standardSQL

SELECT age.country_name, SUM(age.population) AS under_25, pop.midyear_population AS total, ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25 FROM ( SELECT country_name, population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population_agespecific WHERE year =2017 AND age < 25) age INNER JOIN ( SELECT midyear_population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population WHERE year = 2017) pop ON age.country_code = pop.country_code GROUP BY 1, 3 ORDER BY 4 DESC /* Remove limit for visualization*/ LIMIT 10

Sample Query 3

The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.

SELECT growth.country_name, growth.net_migration, CAST(area.country_area AS INT64) AS country_area FROM ( SELECT country_name, net_migration, country_code FROM bigquery-public-data.census_bureau_international.birth_death_growth_rates WHERE year = 2017) growth INNER JOIN ( SELECT country_area, country_code FROM bigquery-public-data.census_bureau_international.country_names_area

Update frequency

Historic (none)

Dataset source

United States Census Bureau

Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
NPPES Plan and Provider Enumeration System
kaggle.com
zip
Updated Mar 20, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2019). NPPES Plan and Provider Enumeration System [Dataset]. https://www.kaggle.com/cms/nppes
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
Centers for Medicare & Medicaid Services
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The CMS National Plan and Provider Enumeration System (NPPES) was developed as part of the Administrative Simplification provisions in the original HIPAA act. The primary purpose of NPPES was to develop a unique identifier for each physician that billed medicare and medicaid. This identifier is now known as the National Provider Identifier Standard (NPI) which is a required 10 digit number that is unique to an individual provider at the national level.

Once an NPI record is assigned to a healthcare provider, parts of the NPI record that have public relevance, including the provider’s name, speciality, and practice address are published in a searchable website as well as downloadable file of zipped data containing all of the FOIA disclosable health care provider data in NPPES and a separate PDF file of code values which documents and lists the descriptions for all of the codes found in the data file.

Content

The dataset contains the latest NPI downloadable file in an easy to query BigQuery table, npi_raw. In addition, there is a second table, npi_optimized which harnesses the power of Big Query’s next-generation columnar storage format to provide an analytical view of the NPI data containing description fields for the codes based on the mappings in Data Dissemination Public File - Code Values documentation as well as external lookups to the healthcare provider taxonomy codes . While this generates hundreds of columns, BigQuery makes it possible to process all this data effectively and have a convenient single lookup table for all provider information.

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:nppes?_ga=2.117120578.-577194880.1523455401

https://console.cloud.google.com/marketplace/details/hhs/nppes?filter=category:science-research

Dataset Source: Center for Medicare and Medicaid Services. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @rawpixel from Unplash.

Inspiration

What are the top ten most common types of physicians in Mountain View?

What are the names and phone numbers of dentists in California who studied public health?
Bitcoin Blockchain Historical Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Bitcoin Blockchain Historical Data [Dataset]. https://www.kaggle.com/bigquery/bitcoin-blockchain
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Blockchain technology, first implemented by Satoshi Nakamoto in 2009 as a core component of Bitcoin, is a distributed, public ledger recording transactions. Its usage allows secure peer-to-peer communication by linking blocks containing hash pointers to a previous block, a timestamp, and transaction data. Bitcoin is a decentralized digital currency (cryptocurrency) which leverages the Blockchain to store transactions in a distributed manner in order to mitigate against flaws in the financial industry.

Nearly ten years after its inception, Bitcoin and other cryptocurrencies experienced an explosion in popular awareness. The value of Bitcoin, on the other hand, has experienced more volatility. Meanwhile, as use cases of Bitcoin and Blockchain grow, mature, and expand, hype and controversy have swirled.

Content

In this dataset, you will have access to information about blockchain blocks and transactions. All historical data are in the bigquery-public-data:crypto_bitcoin dataset. It’s updated it every 10 minutes. The data can be joined with historical prices in kernels. See available similar datasets here: https://www.kaggle.com/datasets?search=bitcoin.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_bitcoin.[TABLENAME]. Fork this kernel to get started.

Method & Acknowledgements

Allen Day (Twitter | Medium), Google Cloud Developer Advocate & Colin Bookman, Google Cloud Customer Engineer retrieve data from the Bitcoin network using a custom client available on GitHub that they built with the bitcoinj Java library. Historical data from the origin block to 2018-01-31 were loaded in bulk to two BigQuery tables, blocks_raw and transactions. These tables contain fresh data, as they are now appended when new blocks are broadcast to the Bitcoin network. For additional information visit the Google Cloud Big Data and Machine Learning Blog post "Bitcoin in BigQuery: Blockchain analytics on public data".

Photo by Andre Francois on Unsplash.

Inspiration

How many bitcoins are sent each day?

How many addresses receive bitcoin each day?

Compare transaction volume to historical prices by joining with other available data sources
Ethereum Classic Blockchain
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Ethereum Classic Blockchain [Dataset]. https://www.kaggle.com/datasets/bigquery/crypto-ethereum-classic
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Ethereum Classic is an open-source, public, blockchain-based distributed computing platform featuring smart contract (scripting) functionality. It provides a decentralized Turing-complete virtual machine, the Ethereum Virtual Machine (EVM), which can execute scripts using an international network of public nodes. Ethereum Classic and Ethereum have a value token called "ether", which can be transferred between participants, stored in a cryptocurrency wallet and is used to compensate participant nodes for computations performed in the Ethereum Platform.

Ethereum Classic came into existence when some members of the Ethereum community rejected the DAO hard fork on the grounds of "immutability", the principle that the blockchain cannot be changed, and decided to keep using the unforked version of Ethereum. Till this day, Etherum Classic runs the original Ethereum chain.

Content

In this dataset, you will have access to Ethereum Classic (ETC) historical block data along with transactions and traces. You can access the data from BigQuery in your notebook with bigquery-public-data.crypto_ethereum_classic dataset.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_ethereum_classic.[TABLENAME]. Fork this kernel to get started.

Acknowledgements

This dataset wouldn't be possible without the help of Allen Day, Evgeny Medvedev and Yaz Khoury. This dataset uses Blockchain ETL. Special thanks to ETC community member @donsyang for the banner image.

Inspiration

One of the main questions we wanted to answer was the Gini coefficient of ETC data. We also wanted to analyze the DAO Smart Contract before and after the DAO Hack and the resulting Hardfork. We also wanted to analyze the network during the famous 51% attack and see what sort of patterns we can spot about the attacker.
census-bureau-usa
kaggle.com
zip
Updated May 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). census-bureau-usa [Dataset]. https://www.kaggle.com/datasets/bigquery/census-bureau-usa
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset authored and provided by
Google BigQuery
Area covered
United States
Description
Context :

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. Update frequency: Historic (none)

Dataset source

United States Census Bureau

Sample Query

SELECT zipcode, population FROM bigquery-public-data.census_bureau_usa.population_by_zip_2010 WHERE gender = '' ORDER BY population DESC LIMIT 10

Terms of use

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/us-census-data
h
stackexchange
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Albert Gong, stackexchange [Dataset]. https://huggingface.co/datasets/ag2435/stackexchange
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Albert Gong
Description
StackExchange Dataset

Working doc: https://docs.google.com/document/d/1h585bH5sYcQW4pkHzqWyQqA4ape2Bq6o1Cya0TkMOQc/edit?usp=sharing

BigQuery query (see so_bigquery.ipynb): CREATE TEMP TABLE answers AS SELECT * FROM bigquery-public-data.stackoverflow.posts_answers WHERE LOWER(Body) LIKE '%arxiv%';

CREATE TEMPORARY TABLE questions AS SELECT * FROM bigquery-public-data.stackoverflow.posts_questions;

SELECT * FROM answers JOIN questions ON questions.id = answers.parent_id;

NOTE:… See the full description on the dataset page: https://huggingface.co/datasets/ag2435/stackexchange.
PatentsView Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). PatentsView Data [Dataset]. https://www.kaggle.com/datasets/bigquery/patentsview
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

The USPTO grants US patents to inventors and assignees all over the world. For researchers in particular, PatentsView is intended to encourage the study and understanding of the intellectual property (IP) and innovation system; to serve as a fundamental function of the government in creating “public good” platforms in these data; and to eliminate redundant cleaning, converting and matching of these data by individual researchers, thus freeing up researcher time to do what they do best—study IP, innovation, and technological change.

Content

PatentsView Data is a database that longitudinally links inventors, their organizations, locations, and overall patenting activity. The dataset uses data derived from USPTO bulk data files.

Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

Acknowledgements

“PatentsView” by the USPTO, US Department of Agriculture (USDA), the Center for the Science of Science and Innovation Policy, New York University, the University of California at Berkeley, Twin Arch Technologies, and Periscopic, used under CC BY 4.0.

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patentsview

Banner photo by rawpixel on Unsplash
h
github-r-repos
huggingface.co
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Falbel (2023). github-r-repos [Dataset]. https://huggingface.co/datasets/dfalbel/github-r-repos
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2023
Authors
Daniel Falbel
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
GitHub R repositories dataset

R source files from GitHub. This dataset has been created using the public GitHub datasets from Google BigQuery. This is the actual query that has been used to export the data: EXPORT DATA OPTIONS ( uri = 'gs://your-bucket/gh-r/*.parquet', format = 'PARQUET') as ( select f.id, f.repo_name, f.path, c.content, c.size from ( SELECT distinct id, repo_name, path FROM bigquery-public-data.github_repos.files where ends_with(path… See the full description on the dataset page: https://huggingface.co/datasets/dfalbel/github-r-repos.
Medicare and Medicaid Services
kaggle.com
zip
Updated Apr 22, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). Medicare and Medicaid Services [Dataset]. https://www.kaggle.com/datasets/bigquery/sdoh-hrsa-shortage-areas
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 22, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

This public dataset was created by the Centers for Medicare & Medicaid Services. The data summarize counts of enrollees who are dually-eligible for both Medicare and Medicaid program, including those in Medicare Savings Programs. “Duals” represent 20 percent of all Medicare beneficiaries, yet they account for 34 percent of all spending by the program, according to the Commonwealth Fund . As a representation of this high-needs, high-cost population, these data offer a view of regions ripe for more intensive care coordination that can address complex social and clinical needs. In addition to the high cost savings opportunity to deliver upstream clinical interventions, this population represents the county-by-county volume of patients who are eligible for both state level (Medicaid) and federal level (Medicare) reimbursements and potential funding streams to address unmet social needs across various programs, waivers, and other projects. The dataset includes eligibility type and enrollment by quarter, at both the state and county level. These data represent monthly snapshots submitted by states to the CMS, which are inherently lower than ever-enrolled counts (which include persons enrolled at any time during a calendar year.) For more information on dually eligible beneficiaries

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.sdoh_cms_dual_eligible_enrollment.

Sample Query

In what counties in Michigan has the number of dual-eligible individuals increased the most from 2015 to 2018? Find the counties in Michigan which have experienced the largest increase of dual enrollment households

duals_Jan_2015 AS ( SELECT Public_Total AS duals_2015, County_Name, FIPS FROM bigquery-public-data.sdoh_cms_dual_eligible_enrollment.dual_eligible_enrollment_by_county_and_program WHERE State_Abbr = "MI" AND Date = '2015-12-01' ),

duals_increase AS ( SELECT d18.FIPS, d18.County_Name, d15.duals_2015, d18.duals_2018, (d18.duals_2018 - d15.duals_2015) AS total_duals_diff FROM duals_Jan_2018 d18 JOIN duals_Jan_2015 d15 ON d18.FIPS = d15.FIPS )

SELECT * FROM duals_increase WHERE total_duals_diff IS NOT NULL ORDER BY total_duals_diff DESC
Google Trends
console.cloud.google.com
Updated Jun 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ES (2022). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=ES
Explore at:
Dataset updated
Jun 11, 2022
Dataset provided by
Google Searchhttp://google.com/
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description
The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
h
FinML
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven, FinML [Dataset]. https://huggingface.co/datasets/StevenJingfeng/FinML
Explore at:
Authors
Steven
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
FinML-Chain: A Blockchain-Integrated Dataset for Enhanced Financial Machine Learning

Data Collection for On-chain Data

We collect the data through BigQuery, and the code we used is in Query Code for querying data You can also refer to BigQuery for more information.

Collection for Off-chain Data On-chain Data Infomation

Data Files Data TypeData Content

ETH-Token-airdrop.csv Raw Data Critical indicators related to gas during… See the full description on the dataset page: https://huggingface.co/datasets/StevenJingfeng/FinML.
cms-medicare
kaggle.com
zip
Updated Apr 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). cms-medicare [Dataset]. https://www.kaggle.com/datasets/bigquery/cms-medicare
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 21, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

This dataset contains Hospital General Information from the U.S. Department of Health & Human Services. This is the BigQuery COVID-19 public dataset. This data contains a list of all hospitals that have been registered with Medicare. This list includes addresses, phone numbers, hospital types and quality of care information. The quality of care data is provided for over 4,000 Medicare-certified hospitals, including over 130 Veterans Administration (VA) medical centers, across the country. You can use this data to find hospitals and compare the quality of their care

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.cms_medicare.hospital_general_info.

Sample Query

How do the hospitals in Mountain View, CA compare to the average hospital in the US? With the hospital compare data you can quickly understand how hospitals in one geographic location compare to another location. In this example query we compare Google’s home in Mountain View, California, to the average hospital in the United States. You can also modify the query to learn how the hospitals in your city compare to the US national average.

“#standardSQL SELECT MTV_AVG_HOSPITAL_RATING, US_AVG_HOSPITAL_RATING FROM ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS MTV_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE city = 'MOUNTAIN VIEW' AND state = 'CA' AND hospital_overall_rating <> 'Not Available') MTV JOIN ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS US_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE hospital_overall_rating <> 'Not Available') ON 1 = 1”

What are the most common diseases treated at hospitals that do well in the category of patient readmissions? For hospitals that achieved “Above the national average” in the category of patient readmissions, it might be interesting to review the types of diagnoses that are treated at those inpatient facilities. While this query won’t provide the granular detail that went into the readmission calculation, it gives us a quick glimpse into the top disease related groups (DRG)
, or classification of inpatient stays that are found at those hospitals. By joining the general hospital information to the inpatient charge data, also provided by CMS, you could quickly identify DRGs that may warrant additional research. You can also modify the query to review the top diagnosis related groups for hospital metrics you might be interested in. “#standardSQL SELECT drg_definition, SUM(total_discharges) total_discharge_per_drg FROM bigquery-public-data.cms_medicare.hospital_general_info gi INNER JOIN bigquery-public-data.cms_medicare.inpatient_charges_2015 ic ON gi.provider_id = ic.provider_id WHERE readmission_national_comparison = 'Above the national average' GROUP BY drg_definition ORDER BY total_discharge_per_drg DESC LIMIT 10;”
Sportradar Baseball dataset
kaggle.com
zip
Updated Aug 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sportradar (2019). Sportradar Baseball dataset [Dataset]. https://www.kaggle.com/datasets/sportradar/baseball
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Aug 30, 2019
Dataset authored and provided by
Sportradarhttp://sportradar.com/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. With this data you can effectively replay a game and rebuild basic statistics for players and teams.

Content

games_wide - Every pitch, steal, or lineup event for each at bat in the 2016 regular season.

games_post_wide - Every pitch, steal, or lineup event for each at-bat in the 2016 post season.

schedules - The schedule for every team in the regular season.

*The schemas for the games_wide and games_post_wide tables are identical.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.

Acknowledgements

Dataset Source: Sportradar LLC

Use: Copyright Sportradar LLC. Access to data is intended solely for internal research and testing purposes, and is not to be used for any business or commercial purpose. Data are not to be exploited in any manner without express approval from Sportradar. Display of data must include the phrase, “Data provided by Sportradar LLC,” and be hyperlinked to www.sportradar.com.
Point-in-Time Homelessness Count
kaggle.com
zip
Updated May 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). Point-in-Time Homelessness Count [Dataset]. https://www.kaggle.com/datasets/bigquery/sdoh-hud-pit-homelessness
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 6, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

This database contains the data reported in the Annual Homeless Assessment Report to Congress (AHAR). It represents a point-In-time count (PIT) of homeless individuals, as well as a housing inventory count (HIC) conducted annually.

The data represent the most comprehensive national-level assessment of homelessness in America, including PIT and HIC estimates of homelessness, as well as estimates of chronically homeless persons, homeless veterans, and homeless children and youth.

These data can be trended over time and correlated with other metrics of housing availability and affordability, in order to better understand the particular type of housing resources that may be needed from a social determinants of health perspective.

HUD captures these data annually through the Continuum of Care (CoC) program. CoC-level reporting data have been crosswalked to county levels for purposes of analysis of this dataset.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.sdoh_hud_pit_homelessness

Sample Query

What has been the change in the number of homeless veterans in the state of New York’s CoC Regions since 2012? Determine how the patterns of homeless veterans have changes across the state of New York

homeless_2018 AS ( SELECT Homeless_Veterans AS Vet18, CoC_Name
FROM bigquery-public-data.sdoh_hud_pit_homelessness.hud_pit_by_coc WHERE SUBSTR(CoC_Number,0,2) = "NY" AND Count_Year = 2018 ),

veterans_change AS ( SELECT homeless_2012.COC_Name, Vet12, Vet18, Vet18 - Vet12 AS VetChange FROM homeless_2018 JOIN homeless_2012 ON homeless_2018.CoC_Name = homeless_2012.CoC_Name )

SELECT * FROM veterans_change
COVID19 - The New York Times
kaggle.com
zip
Updated May 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). COVID19 - The New York Times [Dataset]. https://www.kaggle.com/bigquery/covid19-nyt
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies. More information on the data repository is available here . For additional reporting and data visualizations, see The New York Times’ U.S. coronavirus interactive site

Sample Queries

Query 1

Which US counties have the most confirmed cases per capita? This query determines which counties have the most cases per 100,000 residents. Note that this may differ from similar queries of other datasets because of differences in reporting lag, methodologies, or other dataset differences.

SELECT covid19.county, covid19.state_name, total_pop AS county_population, confirmed_cases, ROUND(confirmed_cases/total_pop *100000,2) AS confirmed_cases_per_100000, deaths, ROUND(deaths/total_pop *100000,2) AS deaths_per_100000 FROM bigquery-public-data.covid19_nyt.us_counties covid19 JOIN bigquery-public-data.census_bureau_acs.county_2017_5yr acs ON covid19.county_fips_code = acs.geo_id WHERE date = DATE_SUB(CURRENT_DATE(),INTERVAL 1 day) AND covid19.county_fips_code != "00000" ORDER BY confirmed_cases_per_100000 desc

Query 2

How do I calculate the number of new COVID-19 cases per day? This query determines the total number of new cases in each state for each day available in the dataset SELECT b.state_name, b.date, MAX(b.confirmed_cases - a.confirmed_cases) AS daily_confirmed_cases FROM (SELECT state_name AS state, state_fips_code , confirmed_cases, DATE_ADD(date, INTERVAL 1 day) AS date_shift FROM bigquery-public-data.covid19_nyt.us_states WHERE confirmed_cases + deaths > 0) a JOIN bigquery-public-data.covid19_nyt.us_states b ON a.state_fips_code = b.state_fips_code AND a.date_shift = b.date GROUP BY b.state_name, date ORDER BY date desc
The GDELT Project
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The GDELT Project (2019). The GDELT Project [Dataset]. https://www.kaggle.com/datasets/gdelt/gdelt
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset authored and provided by
The GDELT Project
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?

Content

GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.

You may find these code books helpful:
GDELT Global Knowledge Graph Codebook V2.1 (PDF)
GDELT Event Codebook V2.0 (PDF)

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.

Acknowledgements

You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).

Facebook

Twitter

Click to copy link

Link copied

Cite

https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de

Google Ads Transparency Center

Explore at:

14 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 6, 2023

Dataset provided by

Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery

Description

This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Clear search

Close search

Google apps

Main menu

Google Ads Transparency Center

Looker Ecommerce BigQuery Dataset

Looker Ecommerce Dataset Description

1. distribution_centers.csv

2. events.csv

3. inventory_items.csv

4. order_items.csv

5. orders.csv

6. products.csv

7. users.csv

Google Analytics Sample

Google Patents Public Data

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Content

Acknowledgements

census-bureau-international

Context

Querying BigQuery tables

Sample Query 1

standardSQL

Sample Query 2

standardSQL

Sample Query 3

Update frequency

Dataset source

NPPES Plan and Provider Enumeration System

Context

Content

Acknowledgements

Inspiration

Bitcoin Blockchain Historical Data

Context

Content

Querying BigQuery tables

Method & Acknowledgements

Inspiration

Ethereum Classic Blockchain

Context

Content

Querying BigQuery tables

Acknowledgements

Inspiration

census-bureau-usa

Context :

Dataset source

Sample Query

Terms of use

stackexchange

PatentsView Data

Context

Content

Acknowledgements

github-r-repos

Medicare and Medicaid Services

Context

Querying BigQuery tables

Sample Query

Google Trends

FinML

cms-medicare

Context

Querying BigQuery tables

Sample Query

Sportradar Baseball dataset

Context

Content

Querying BigQuery tables

Acknowledgements

Point-in-Time Homelessness Count

Context

Querying BigQuery tables

Sample Query

COVID19 - The New York Times

Context

Sample Queries

Query 1

Query 2

The GDELT Project

Context

1. `distribution_centers.csv`

2. `events.csv`

3. `inventory_items.csv`

4. `order_items.csv`

5. `orders.csv`

6. `products.csv`

7. `users.csv`