19 datasets found

Weather Data: Creating a New Table In BigQuery
kaggle.com
Updated Feb 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Rokitka (2024). Weather Data: Creating a New Table In BigQuery [Dataset]. https://www.kaggle.com/datasets/stephenrokitka/weather-data-creating-a-new-table-in-bigquery
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Stephen Rokitka
Description
Dataset

This dataset was created by Stephen Rokitka

Released under Other (specified in description)

Contents
h
hacker_news_with_comments
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KunLi, hacker_news_with_comments [Dataset]. https://huggingface.co/datasets/Linkseed/hacker_news_with_comments
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
KunLi
License
https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/
Description
Dataset Card for [Dataset Name]

Dataset Summary

Hacker news until 2015 with comments. Collect from Google BigQuery open dataset. We didn't do any pre-processing except remove HTML tags.

Supported Tasks and Leaderboards

Comment Generation; News analysis with comments; Other comment-based NLP tasks.

Languages

English

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/Linkseed/hacker_news_with_comments.
Google Analytics Sample
console.cloud.google.com
Updated Jul 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=pl&inv=1&invt=Ab3yJQ (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=pl
Explore at:
Dataset updated
Jul 15, 2017
Dataset provided by
Googlehttp://google.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
USPTO OCE Patent Assignment Dataset
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). USPTO OCE Patent Assignment Dataset [Dataset]. https://www.kaggle.com/bigquery/uspto-oce-assignment
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

The Office of the Chief Economist (OCE) is responsible for advising the Under Secretary of Commerce for Intellectual Property and Director of the USPTO on the economic implications of policies and programs affecting the U.S. intellectual property (IP) system. The office disseminates detailed patent and trademark data, undertakes research, and conducts economic analysis on a variety of IP issues. OCE works with policy makers, collaborates with academics, and engages the public more generally through conferences it organizes, the publicly accessible research datasets it provides, and its publications.

Content

The USPTO OCE Patent Assignment Dataset contains detailed data patent assignments and other transactions recorded at the USPTO since 1970.

Acknowledgements

"USPTO OCE Patent Assignment Data" by the USPTO, for public use. Marco, Alan C., Graham, Stuart J.H., Myers, Amanda F., D'Agostino, Paul A and Apple, Kirsten, "The USPTO Patent Assignment Dataset: Descriptions and Analysis" (July 27, 2015).

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_assignment

Banner photo by Jeff Sheldon on Unsplash
Human Variant Annotation Datasets
console.cloud.google.com
Updated Jul 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&inv=1&invt=Ab3i5A (2022). Human Variant Annotation Datasets [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/human-variant-annotation-public
Explore at:
Dataset updated
Jul 16, 2022
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
These datasets are important to genomics researchers because they characterize several aspects of what the scientific community has learned to date about human sequence variants. Making this human annotation data freely available in GCP will enable researchers to focus less on data movement and management tasks associated with procuring this data and instead make immediate use of the data to better understand the clinical relevance of particular variant such as disease causing or protective variants (ClinVar), search a catalog of SNPs that have been identified in the human genome (dbSNP), and discover how frequently a particular variant occurs across the human population (1000Genomes, ESP, ExAC, gnomAD) This human annotation dataset contains both a mirror of the original Variant Call Files (VCF) files from NCBI, NHLBI Exome Sequencing Project (ESP) and ensembl as Google Cloud Storage (GCS) objects. In addition, these human sequence variants have also been translated into a particular variant table format and made available in Google BigQuery giving researchers the ability to use cloud technology and code repositories such as the Verily Life Sciences Annotation Toolkit to perform analyses in parallel. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This public dataset is hosted in Google Cloud Storage and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.
Dogecoin Crypto Blockchain
kaggle.com
zip
Updated Feb 14, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Dogecoin Crypto Blockchain [Dataset]. https://www.kaggle.com/bigquery/crypto-dogecoin
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 14, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Dogecoin is an open source peer-to-peer digital currency, favored by Shiba Inus worldwide. It is qualitatively more fun while being technically nearly identical to its close relative Bitcoin. This dataset contains the blockchain data in their entirety, pre-processed to be human-friendly and to support common use cases such as auditing, investigating, and researching the economic and financial properties of the system.

Content

You can access the data from BigQuery in your notebook with bigquery-public-data.crypto_dogecoin dataset.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_dogecoin.[TABLENAME].

Acknowledgements

This dataset wouldn't be possible without the help of BigQuery and all of their contributions to public data.
Cooperative Patent Classification (CPC) Data
kaggle.com
zip
Updated Mar 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Cooperative Patent Classification (CPC) Data [Dataset]. https://www.kaggle.com/bigquery/cpc
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 7, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

The CPC is the result of a partnership between the EPO and the USPTO in their joint effort to develop a common, internationally compatible classification system for technical documents, in particular patent publications, which will be used by both offices in the patent granting process.

Content

Cooperative Patent Classification Data contains the scheme and definitions of the Cooperative Patent Classification system for classifying patent documents.

Acknowledgments

“Cooperative Patent Classification” by the EPO and USPTO, for public use. Modifications have been made to parse the XML description sections to extract references to other classification symbols.

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:cpc

Banner photo by Helloquence on Unsplash
International Census Data
console.cloud.google.com
Updated Nov 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&inv=1&invt=Ab3y7Q (2019). International Census Data [Dataset]. https://console.cloud.google.com/marketplace/product/united-states-census-bureau/international-census-data
Explore at:
Dataset updated
Nov 19, 2019
Dataset provided by
Googlehttp://google.com/
Description
The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates. Note: The U.S. Census Bureau provides estimates and projections for countries and areas that are recognized by the U.S. Department of State that have a population of at least 5,000. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
New York City Taxi Fare BigQuery Dataset
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DJ Sterling (2019). New York City Taxi Fare BigQuery Dataset [Dataset]. https://www.kaggle.com/dster/nyc-taxi-fare-bigquery-dataset
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Authors
DJ Sterling
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
New York
Description
BigQuery table with the training and test datasets for the New York City Taxi Fare Prediction Competition
Ethereum Classic Blockchain
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Ethereum Classic is an open-source, public, blockchain-based distributed computing platform featuring smart contract (scripting) functionality. It provides a decentralized Turing-complete virtual machine, the Ethereum Virtual Machine (EVM), which can execute scripts using an international network of public nodes. Ethereum Classic and Ethereum have a value token called "ether", which can be transferred between participants, stored in a cryptocurrency wallet and is used to compensate participant nodes for computations performed in the Ethereum Platform.

Ethereum Classic came into existence when some members of the Ethereum community rejected the DAO hard fork on the grounds of "immutability", the principle that the blockchain cannot be changed, and decided to keep using the unforked version of Ethereum. Till this day, Etherum Classic runs the original Ethereum chain.

Content

In this dataset, you will have access to Ethereum Classic (ETC) historical block data along with transactions and traces. You can access the data from BigQuery in your notebook with bigquery-public-data.crypto_ethereum_classic dataset.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_ethereum_classic.[TABLENAME]. Fork this kernel to get started.

Acknowledgements

This dataset wouldn't be possible without the help of Allen Day, Evgeny Medvedev and Yaz Khoury. This dataset uses Blockchain ETL. Special thanks to ETC community member @donsyang for the banner image.

Inspiration

One of the main questions we wanted to answer was the Gini coefficient of ETC data. We also wanted to analyze the DAO Smart Contract before and after the DAO Hack and the resulting Hardfork. We also wanted to analyze the network during the famous 51% attack and see what sort of patterns we can spot about the attacker.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
NEAR Protocol
console.cloud.google.com
Updated Sep 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=en-GB&inv=1&invt=Ab3x2g (2023). NEAR Protocol [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/crypto-near-mainnet?hl=en-GB
Explore at:
Dataset updated
Sep 25, 2023
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Description
NEAR is a user-friendly and carbon-neutral blockchain, built from the ground up to be performant, secure, and infinitely scalable. In technical terms, NEAR is a layer one , sharded , proof-of-stake blockchain built with usability in mind. In simple terms, NEAR is blockchain for everyone. NEAR Crypto Public Dataset on Google Cloud This dataset for NEAR blockchain contains the source code for ingesting NEAR Protocol data stored as OLAP-formatted text on Google Cloud BigQuery. The source data is streamed and transformed into cleaned and enriched tables. Who is this for? Blockchain data indexing is for anyone who wants to make sense of blockchain data. This includes: Users: create queries to track NEAR assets,monitor transactions, or analyze onchain events at massive scale Researchers: use indexed for data science tasks including onchain activities, identifying trends, or feed AI/ML pipelines for predective analysis Startups: can use NEAR's indexed data for deep insights on user engagement, smart contract utilization, or insights across tokens and NFT adoption Benefits Near instant insights: Historical onchain data queried at scale Cost-effective:eliminate the need to store and process bulk NEAR protocol data; query as little or as much data as preferred Easy to use: no prior experience with blockchain technology required; bring a general knowledge of SQL to unlock insights. Extract to powerful collaborative services such as Google Connected Sheets , Looker Data Studio , or integrate to third party tools like Tableau
The Met Public Domain Art Works
kaggle.com
zip
Updated Mar 20, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Metropolitan Museum of Art (2019). The Met Public Domain Art Works [Dataset]. https://www.kaggle.com/metmuseum/the-met
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
The Metropolitan Museum of Art
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Metropolitan Museum of Art, better known as the Met, provides a public domain dataset with over 200,000 objects including metadata and images. In early 2017, the Met debuted their Open Access policy to make part of their collection freely available for unrestricted use under the Creative Commons Zero designation and their own terms and conditions.

Content

This dataset provides a new view to one of the world’s premier collections of fine art. The data includes both image in Google Cloud Storage, and associated structured data in two BigQuery two tables, objects and images (1:N). Locations to images on both The Met’s website and in Google Cloud Storage are available in the BigQuery table.

Fork this kernel to get started with this dataset.

https://cloud.google.com/blog/big-data/2017/08/images/150177792553261/met03.png" alt=""> https://cloud.google.com/blog/big-data/2017/08/images/150177792553261/met03.png

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:the_met

https://console.cloud.google.com/launcher/details/the-metropolitan-museum-of-art/the-met-public-domain-art-works

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.metmuseum.org/about-the-met/policies-and-documents/image-resources — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @danieltong from Unplash.

Inspiration

What are the types of art by department?

What are the earliest photographs in the collection?

What was the most prolific period for ancient Egyptian Art?
Dash Crypto Blockchain
kaggle.com
zip
Updated Feb 14, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Dash Crypto Blockchain [Dataset]. https://www.kaggle.com/bigquery/crypto-dash
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 14, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Dash is a cryptocurrency governed by a decentralized autonomous organization (DAO) run by a subset of users, called "masternodes". The currency permits fast transactions that are difficult to trace. This dataset contains the blockchain data in their entirety, pre-processed to be human-friendly and to support common use cases such as auditing, investigating, and researching the economic and financial properties of the system.

Content

You can access the data from BigQuery in your notebook with bigquery-public-data.crypto_dash dataset.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_dash.[TABLENAME].

Acknowledgements

This dataset wouldn't be possible without the help of BigQuery and all of their contributions to public data.
USPTO OCE Patent Litigation Docket Reports Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). USPTO OCE Patent Litigation Docket Reports Data [Dataset]. https://www.kaggle.com/bigquery/uspto-oce-litigation
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

OCE collected all of the data from the Public Access to Court Electronics Records (PACER) and RECAP, an independent project designed to serve as a repository for litigation data sourced from PACER. The final output datasets include information on the litigating parties involved and their attorneys; the cause of action; the court location; important dates in the litigation history; and descriptions of all documents submitted in a given case, which cover more than 5 million separate documents contained in the case docket reports.

Content

USPTO OCE Patent Litigation Docket Reports Data contains detailed patent litigation data on 74,623 unique district court cases filed during the period 1963-2015.

Acknowledgements

"USPTO OCE Patent Litigation Docket Reports Data" by the USPTO, for public use. Marco, A., A. Tesfayesus, A. Toole (2017). “Patent Litigation Data from US District Court Electronic Records (1963-2015).” USPTO Economic Working Paper No. 2017-06.

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_litigation

Banner photo by Samuel Zeller on Unsplash
United States International Census
kaggle.com
zip
Updated Aug 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Census Bureau (2019). United States International Census [Dataset]. https://www.kaggle.com/datasets/census/census-bureau-international
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Aug 30, 2019
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
US Census Bureau
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

The United States Census Bureau’s International Dataset provides estimates of country populations since 1950 and projections through 2050.

Content

The U.S. Census Bureau provides estimates and projections for countries and areas that are recognized by the U.S. Department of State that have a population of at least 5,000. Specifically, the data set includes midyear population figures broken down by age and gender assignment at birth. Additionally, they provide time-series data for attributes including fertility rates, birth rates, death rates, and migration rates.

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_international

https://cloud.google.com/bigquery/public-data/international-census

Dataset Source: www.census.gov

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source -http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by Steve Richey from Unsplash.

Inspiration

What countries have the longest life expectancy?

Which countries have the largest proportion of their population under 25?

Which countries are seeing the largest net migration?
census-bureau-international
kaggle.com
zip
Updated May 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 6, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.

Sample Query 1

What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!

standardSQL

SELECT age.country_name, age.life_expectancy, size.country_area FROM ( SELECT country_name, life_expectancy FROM bigquery-public-data.census_bureau_international.mortality_life_expectancy WHERE year = 2016) age INNER JOIN ( SELECT country_name, country_area FROM bigquery-public-data.census_bureau_international.country_names_area where country_area > 25000) size ON age.country_name = size.country_name ORDER BY 2 DESC /* Limit removed for Data Studio Visualization */ LIMIT 10

Sample Query 2

Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.

standardSQL

SELECT age.country_name, SUM(age.population) AS under_25, pop.midyear_population AS total, ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25 FROM ( SELECT country_name, population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population_agespecific WHERE year =2017 AND age < 25) age INNER JOIN ( SELECT midyear_population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population WHERE year = 2017) pop ON age.country_code = pop.country_code GROUP BY 1, 3 ORDER BY 4 DESC /* Remove limit for visualization*/ LIMIT 10

Sample Query 3

The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.

SELECT growth.country_name, growth.net_migration, CAST(area.country_area AS INT64) AS country_area FROM ( SELECT country_name, net_migration, country_code FROM bigquery-public-data.census_bureau_international.birth_death_growth_rates WHERE year = 2017) growth INNER JOIN ( SELECT country_area, country_code FROM bigquery-public-data.census_bureau_international.country_names_area

Update frequency

Historic (none)

Dataset source

United States Census Bureau

Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
World Bank: GHNP Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2019). World Bank: GHNP Data [Dataset]. https://www.kaggle.com/theworldbank/world-bank-health-population
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
World Bankhttps://www.worldbank.org/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries.

Update Frequency: Biannual

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics

https://cloud.google.com/bigquery/public-data/world-bank-hnp

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Citation: The World Bank: Health Nutrition and Population Statistics

Banner Photo by @till_indeman from Unplash.

Inspiration

What’s the average age of first marriages for females around the world?
NCAA Basketball
kaggle.com
zip
Updated Mar 20, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NCAA (2019). NCAA Basketball [Dataset]. https://www.kaggle.com/datasets/ncaa/ncaa-basketball
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset provided by
National Collegiate Athletic Associationhttp://ncaa.com/
Authors
NCAA
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview

This dataset contains data about NCAA Basketball games, teams, and players. Game data covers play-by-play and box scores back to 2009, as well as final scores back to 1996. Additional data about wins and losses goes back to the 1894-5 season in some teams' cases.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.

Acknowledgements

Sportradar: Copyright Sportradar LLC. Access to data is intended solely for internal research and testing purposes, and is not to be used for any business or commercial purpose. Data are not to be exploited in any manner without express approval from Sportradar.

NCAA®: Copyright National Collegiate Athletic Association. Access to data is provided solely for internal research and testing purposes, and may not be used for any business or commercial purpose. Data are not to be exploited in any manner without express approval from the National Collegiate Athletic Association.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Stephen Rokitka (2024). Weather Data: Creating a New Table In BigQuery [Dataset]. https://www.kaggle.com/datasets/stephenrokitka/weather-data-creating-a-new-table-in-bigquery

Weather Data: Creating a New Table In BigQuery

Google Data Analytics II - Chapter I Review Assignment (Task 1)

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 3, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Stephen Rokitka

Description

Dataset

This dataset was created by Stephen Rokitka

Released under Other (specified in description)

Clear search

Close search

Google apps

Main menu

Weather Data: Creating a New Table In BigQuery

Dataset

Contents

hacker_news_with_comments

Google Analytics Sample

USPTO OCE Patent Assignment Dataset

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Content

Acknowledgements

Human Variant Annotation Datasets

Dogecoin Crypto Blockchain

Context

Content

Querying BigQuery tables

Acknowledgements

Cooperative Patent Classification (CPC) Data

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Content

Acknowledgments

International Census Data

New York City Taxi Fare BigQuery Dataset

Ethereum Classic Blockchain

Context

Content

Querying BigQuery tables

Acknowledgements

Inspiration

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

NEAR Protocol

The Met Public Domain Art Works

Context

Content

Acknowledgements

Inspiration

Dash Crypto Blockchain

Context

Content

Querying BigQuery tables

Acknowledgements

USPTO OCE Patent Litigation Docket Reports Data

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Content

Acknowledgements

United States International Census

Context

Content

Acknowledgements

Inspiration

census-bureau-international

Context

Querying BigQuery tables

Sample Query 1

standardSQL

Sample Query 2

standardSQL

Sample Query 3

Update frequency

Dataset source

World Bank: GHNP Data

Context

Content

Acknowledgements

Inspiration

NCAA Basketball

Overview

Querying BigQuery tables

Acknowledgements

Weather Data: Creating a New Table In BigQuery

Google Data Analytics II - Chapter I Review Assignment (Task 1)

Dataset

Contents