20 datasets found
  1. BigQuery Sample Tables

    • kaggle.com
    zip
    Updated Sep 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). BigQuery Sample Tables [Dataset]. https://www.kaggle.com/bigquery/samples
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 4, 2018
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    BigQuery provides a limited number of sample tables that you can run queries against. These tables are suited for testing queries and learning BigQuery.

    Content

    • gsod: Contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010.

    • github_nested: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a nested schema. Created in September 2012.

    • github_timeline: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a flat schema. Created in May 2012.

    • natality: Describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008.

    • shakespeare: Contains a word index of the works of Shakespeare, giving the number of times each word appears in each corpus.

    • trigrams: Contains English language trigrams from a sample of works published between 1520 and 2008.

    • wikipedia: Contains the complete revision history for all Wikipedia articles up to April 2010.

    Fork this kernel to get started.

    Acknowledgements

    Data Source: https://cloud.google.com/bigquery/sample-tables

    Banner Photo by Mervyn Chan from Unplash.

    Inspiration

    How many babies were born in New York City on Christmas Day?

    How many words are in the play Hamlet?

  2. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  3. BigQuery Sample File

    • kaggle.com
    zip
    Updated Jun 26, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ro Kar (2019). BigQuery Sample File [Dataset]. https://www.kaggle.com/datasets/rokar91/bigquery-sample-file
    Explore at:
    zip(6059 bytes)Available download formats
    Dataset updated
    Jun 26, 2019
    Authors
    Ro Kar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Ro Kar

    Released under CC0: Public Domain

    Contents

  4. Google Analytics Sample

    • console.cloud.google.com
    Updated Jul 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=en_GB (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=en_GB
    Explore at:
    Dataset updated
    Jul 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  5. BigQuery sample Data set

    • kaggle.com
    zip
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ritu Barik (2024). BigQuery sample Data set [Dataset]. https://www.kaggle.com/ritubarik/bigquery-sample-data-set
    Explore at:
    zip(565 bytes)Available download formats
    Dataset updated
    Nov 11, 2024
    Authors
    Ritu Barik
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Ritu Barik

    Released under Apache 2.0

    Contents

  6. 1000 Cannabis Genomes Project

    • kaggle.com
    zip
    Updated Feb 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). 1000 Cannabis Genomes Project [Dataset]. https://www.kaggle.com/bigquery/genomics-cannabis
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 26, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Cannabis is a genus of flowering plants in the family Cannabaceae.

    Source: https://en.wikipedia.org/wiki/Cannabis

    Content

    In October 2016, Phylos Bioscience released a genomic open dataset of approximately 850 strains of Cannabis via the Open Cannabis Project. In combination with other genomics datasets made available by Courtagen Life Sciences, Michigan State University, NCBI, Sunrise Medicinal, University of Calgary, University of Toronto, and Yunnan Academy of Agricultural Sciences, the total amount of publicly available data exceeds 1,000 samples taken from nearly as many unique strains.

    https://medium.com/google-cloud/dna-sequencing-of-1000-cannabis-strains-publicly-available-in-google-bigquery-a33430d63998

    These data were retrieved from the National Center for Biotechnology Information’s Sequence Read Archive (NCBI SRA), processed using the BWA aligner and FreeBayes variant caller, indexed with the Google Genomics API, and exported to BigQuery for analysis. Data are available directly from Google Cloud Storage at gs://gcs-public-data--genomics/cannabis, as well as via the Google Genomics API as dataset ID 918853309083001239, and an additional duplicated subset of only transcriptome data as dataset ID 94241232795910911, as well as in the BigQuery dataset bigquery-public-data:genomics_cannabis.

    All tables in the Cannabis Genomes Project dataset have a suffix like _201703. The suffix is referred to as [BUILD_DATE] in the descriptions below. The dataset is updated frequently as new releases become available.

    The following tables are included in the Cannabis Genomes Project dataset:

    Sample_info contains fields extracted for each SRA sample, including the SRA sample ID and other data that give indications about the type of sample. Sample types include: strain, library prep methods, and sequencing technology. See SRP008673 for an example of upstream sample data. SRP008673 is the University of Toronto sequencing of Cannabis Sativa subspecies Purple Kush.

    MNPR01_reference_[BUILD_DATE] contains reference sequence names and lengths for the draft assembly of Cannabis Sativa subspecies Cannatonic produced by Phylos Bioscience. This table contains contig identifiers and their lengths.

    MNPR01_[BUILD_DATE] contains variant calls for all included samples and types (genomic, transcriptomic) aligned to the MNPR01_reference_[BUILD_DATE] table. Samples can be found in the sample_info table. The MNPR01_[BUILD_DATE] table is exported using the Google Genomics BigQuery variants schema. This table is useful for general analysis of the Cannabis genome.

    MNPR01_transcriptome_[BUILD_DATE] is similar to the MNPR01_[BUILD_DATE] table, but it includes only the subset transcriptomic samples. This table is useful for transcribed gene-level analysis of the Cannabis genome.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    Dataset Source: http://opencannabisproject.org/ Category: Genomics Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://www.ncbi.nlm.nih.gov/home/about/policies.shtml - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. Update frequency: As additional data are released to GenBank View in BigQuery: https://bigquery.cloud.google.com/dataset/bigquery-public-data:genomics_cannabis View in Google Cloud Storage: gs://gcs-public-data--genomics/cannabis

    Banner Photo by Rick Proctor from Unplash.

    Inspiration

    Which Cannabis samples are included in the variants table?

    Which contigs in the MNPR01_reference_[BUILD_DATE] table have the highest density of variants?

    How many variants does each sample have at the THC Synthase gene (THCA1) locus?

  7. d

    DataForSEO Google Keyword Database, historical and current

    • datarade.ai
    .json, .csv
    Updated Mar 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataForSEO (2023). DataForSEO Google Keyword Database, historical and current [Dataset]. https://datarade.ai/data-products/dataforseo-google-keyword-database-historical-and-current-dataforseo
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Mar 14, 2023
    Dataset authored and provided by
    DataForSEO
    Area covered
    Canada, Cyprus, Uruguay, Spain, Bangladesh, Bolivia (Plurinational State of), El Salvador, Bahrain, Singapore, Turkey
    Description

    You can check the fields description in the documentation: current Keyword database: https://docs.dataforseo.com/v3/databases/google/keywords/?bash; Historical Keyword database: https://docs.dataforseo.com/v3/databases/google/history/keywords/?bash. You don’t have to download fresh data dumps in JSON or CSV – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.

  8. BigQuery Sample File

    • kaggle.com
    zip
    Updated Sep 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srijan Singh (2019). BigQuery Sample File [Dataset]. https://www.kaggle.com/srijansingh53/bigquery-sample-file
    Explore at:
    zip(5605 bytes)Available download formats
    Dataset updated
    Sep 28, 2019
    Authors
    Srijan Singh
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Dataset

    This dataset was created by Srijan Singh

    Released under GPL 2

    Contents

  9. a

    Limite de Bairros

    • hub.arcgis.com
    • data.rio
    • +1more
    Updated Apr 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2020). Limite de Bairros [Dataset]. https://hub.arcgis.com/maps/PCRJ::limite-de-bairros/about
    Explore at:
    Dataset updated
    Apr 16, 2020
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Base de dados geográfica dos limites de Bairros da Cidade do Rio de Janeiro. Como acessar através do DatalakeBigQuerySELECT * FROM datario.dados_mestres.bairro LIMIT 1000Clique aqui para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery, acesse nosso tutorial para entender como acessar os dados.Pythonimport basedosdados as bd# Para carregar o dado direto no pandasdf = bd.read_sql ( "SELECT * FROM datario.dados_mestres.bairro LIMIT 1000" , billing_project_id = "

  10. GitHub Repo Sample Data

    • kaggle.com
    zip
    Updated Dec 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayur Kr. Garg (2021). GitHub Repo Sample Data [Dataset]. https://www.kaggle.com/mayur7garg/github-repo-sample-data
    Explore at:
    zip(301265354 bytes)Available download formats
    Dataset updated
    Dec 28, 2021
    Authors
    Mayur Kr. Garg
    Description

    About

    This dataset consists of samples of non binary files, their contents and extensions from BigQuery's GitHub public sample repo data.

    File info

    This dataset consists of two CSV files: - filenames_with_ext.csv - This CSV lists all filenames with extensions from BigQuery's GitHub public sample repo data. Files with no extensions have been excluded. - filecontent_with_top_ext.csv - This CSV has samples of non binary files, their contents and extensions from BigQuery's GitHub public sample repo data with subject to some constraints.

    Data extraction

    To understand how this data was extracted and what constraints were used, refer to the following notebook: GitHub Repo Data - mayur7garg

  11. cms-medicare

    • kaggle.com
    zip
    Updated Apr 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). cms-medicare [Dataset]. https://www.kaggle.com/datasets/bigquery/cms-medicare
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 21, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    Description

    Context

    This dataset contains Hospital General Information from the U.S. Department of Health & Human Services. This is the BigQuery COVID-19 public dataset. This data contains a list of all hospitals that have been registered with Medicare. This list includes addresses, phone numbers, hospital types and quality of care information. The quality of care data is provided for over 4,000 Medicare-certified hospitals, including over 130 Veterans Administration (VA) medical centers, across the country. You can use this data to find hospitals and compare the quality of their care

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.cms_medicare.hospital_general_info.

    Sample Query

    How do the hospitals in Mountain View, CA compare to the average hospital in the US? With the hospital compare data you can quickly understand how hospitals in one geographic location compare to another location. In this example query we compare Google’s home in Mountain View, California, to the average hospital in the United States. You can also modify the query to learn how the hospitals in your city compare to the US national average.

    “#standardSQL SELECT MTV_AVG_HOSPITAL_RATING, US_AVG_HOSPITAL_RATING FROM ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS MTV_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE city = 'MOUNTAIN VIEW' AND state = 'CA' AND hospital_overall_rating <> 'Not Available') MTV JOIN ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS US_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE hospital_overall_rating <> 'Not Available') ON 1 = 1”

    What are the most common diseases treated at hospitals that do well in the category of patient readmissions? For hospitals that achieved “Above the national average” in the category of patient readmissions, it might be interesting to review the types of diagnoses that are treated at those inpatient facilities. While this query won’t provide the granular detail that went into the readmission calculation, it gives us a quick glimpse into the top disease related groups (DRG)
    , or classification of inpatient stays that are found at those hospitals. By joining the general hospital information to the inpatient charge data, also provided by CMS, you could quickly identify DRGs that may warrant additional research. You can also modify the query to review the top diagnosis related groups for hospital metrics you might be interested in. “#standardSQL SELECT drg_definition, SUM(total_discharges) total_discharge_per_drg FROM bigquery-public-data.cms_medicare.hospital_general_info gi INNER JOIN bigquery-public-data.cms_medicare.inpatient_charges_2015 ic ON gi.provider_id = ic.provider_id WHERE readmission_national_comparison = 'Above the national average' GROUP BY drg_definition ORDER BY total_discharge_per_drg DESC LIMIT 10;”

  12. OnPoint Weather - Past Weather and Climatology Data Sample

    • console.cloud.google.com
    Updated May 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Weather%20Source&hl=zh-tw (2023). OnPoint Weather - Past Weather and Climatology Data Sample [Dataset]. https://console.cloud.google.com/marketplace/product/weathersource-com/weather-past-climatology?hl=zh-tw
    Explore at:
    Dataset updated
    May 13, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    OnPoint Weather is a global weather dataset for business available for any lat/lon point and geographic area such as ZIP codes. OnPoint Weather provides a continuum of hourly and daily weather from the year 2000 to current time and a forward forecast of 45 days. OnPoint Climatology provides hourly and daily weather statistics which can be used to determine ‘departures from normal’ and to provide climatological guidance of expected weather for any location at any point in time. The OnPoint Climatology provides weather statistics such as means, standard deviations and frequency of occurrence. Weather has a significant impact on businesses and accounts for hundreds of billions in lost revenue annually. OnPoint Weather allows businesses to quantify weather impacts and develop strategies to optimize for weather to improve business performance. Examples of Usage Quantify the impact of weather on sales across diverse locations and times of the year Understand how supply chains are impacted by weather Understand how employee’s attendance and performance are impacted by weather Understand how weather influences foot traffic at malls, stores and restaurants OnPoint Weather is available through Google Cloud Platform’s Commercial Dataset Program and can be easily integrated with other Google Cloud Platform Services to quickly reveal and quantify weather impacts on business. Weather Source provides a full range of support services from answering quick questions to consulting and building custom solutions. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery 瞭解詳情

  13. census-bureau-international

    • kaggle.com
    zip
    Updated May 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). census-bureau-international [Dataset]. https://www.kaggle.com/bigquery/census-bureau-international
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    May 6, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    Description

    Context

    The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.

    Sample Query 1

    What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!

    standardSQL

    SELECT age.country_name, age.life_expectancy, size.country_area FROM ( SELECT country_name, life_expectancy FROM bigquery-public-data.census_bureau_international.mortality_life_expectancy WHERE year = 2016) age INNER JOIN ( SELECT country_name, country_area FROM bigquery-public-data.census_bureau_international.country_names_area where country_area > 25000) size ON age.country_name = size.country_name ORDER BY 2 DESC /* Limit removed for Data Studio Visualization */ LIMIT 10

    Sample Query 2

    Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.

    standardSQL

    SELECT age.country_name, SUM(age.population) AS under_25, pop.midyear_population AS total, ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25 FROM ( SELECT country_name, population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population_agespecific WHERE year =2017 AND age < 25) age INNER JOIN ( SELECT midyear_population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population WHERE year = 2017) pop ON age.country_code = pop.country_code GROUP BY 1, 3 ORDER BY 4 DESC /* Remove limit for visualization*/ LIMIT 10

    Sample Query 3

    The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.

    SELECT growth.country_name, growth.net_migration, CAST(area.country_area AS INT64) AS country_area FROM ( SELECT country_name, net_migration, country_code FROM bigquery-public-data.census_bureau_international.birth_death_growth_rates WHERE year = 2017) growth INNER JOIN ( SELECT country_area, country_code FROM bigquery-public-data.census_bureau_international.country_names_area

    Update frequency

    Historic (none)

    Dataset source

    United States Census Bureau

    Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data

  14. census-bureau-usa

    • kaggle.com
    zip
    Updated May 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). census-bureau-usa [Dataset]. https://www.kaggle.com/datasets/bigquery/census-bureau-usa
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    May 18, 2020
    Dataset authored and provided by
    Google BigQuery
    Area covered
    United States
    Description

    Context :

    The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. Update frequency: Historic (none)

    Dataset source

    United States Census Bureau

    Sample Query

    SELECT zipcode, population FROM bigquery-public-data.census_bureau_usa.population_by_zip_2010 WHERE gender = '' ORDER BY population DESC LIMIT 10

    Terms of use

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/us-census-data

  15. Hacker News Corpus

    • kaggle.com
    zip
    Updated Jun 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hacker News (2017). Hacker News Corpus [Dataset]. https://www.kaggle.com/hacker-news/hacker-news-corpus
    Explore at:
    zip(642956855 bytes)Available download formats
    Dataset updated
    Jun 29, 2017
    Dataset authored and provided by
    Hacker News
    Description

    Context

    This dataset contains a randomized sample of roughly one quarter of all stories and comments from Hacker News from its launch in 2006. Hacker News is a social news website focusing on computer science and entrepreneurship. It is run by Paul Graham's investment fund and startup incubator, Y Combinator. In general, content that can be submitted is defined as "anything that gratifies one's intellectual curiosity".

    Content

    Each story contains a story ID, the author that made the post, when it was written, and the number of points the story received.

    Please note that the text field includes profanity. All texts are the author’s own, do not necessarily reflect the positions of Kaggle or Hacker News, and are presented without endorsement.

    Acknowledgements

    This dataset was kindly made publicly available by Hacker News under the MIT license.

    Inspiration

    • Recent studies have found that many forums tend to be dominated by a very small fraction of users. Is this true of Hacker News?

    • Hacker News has received complaints that the site is biased towards Y Combinator startups. Do the data support this?

    • Is the amount of coverage by Hacker News predictive of a startup’s success?

    Use this dataset with BigQuery

    You can use Kernels to analyze, share, and discuss this data on Kaggle, but if you’re looking for real-time updates and bigger data, check out the data in BigQuery, too: https://cloud.google.com/bigquery/public-data/hacker-news

    The BigQuery version of this dataset has roughly four times as many articles.

  16. Global Health

    • kaggle.com
    zip
    Updated May 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). Global Health [Dataset]. https://www.kaggle.com/bigquery/world-bank-health-population
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    May 18, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    Description

    Context

    This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries

    Sample Query

    What’s the average age of first marriages for females around the world? This query retrieves the average age of first marriages for females by country. Females are used because there is a larger age spread in first marriages for females

    SELECT country_name, ROUND(AVG(value),2) AS average FROM bigquery-public-data.world_bank_health_population.health_nutrition_population WHERE indicator_code = "SP.DYN.SMAM.FE" AND year > 2000 GROUP BY country_name ORDER BY average

  17. BigQuery_Sample_File

    • kaggle.com
    zip
    Updated Jun 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muskan Goel (2019). BigQuery_Sample_File [Dataset]. https://www.kaggle.com/bt18gcs188/bigquery-sample-file
    Explore at:
    zip(1375 bytes)Available download formats
    Dataset updated
    Jun 27, 2019
    Authors
    Muskan Goel
    Description

    Dataset

    This dataset was created by Muskan Goel

    Contents

  18. Chicago Crime

    • kaggle.com
    zip
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashkan Ranjbar (2025). Chicago Crime [Dataset]. https://www.kaggle.com/ashkanranjbar/chicago-crime
    Explore at:
    zip(10641044 bytes)Available download formats
    Dataset updated
    Nov 19, 2025
    Authors
    Ashkan Ranjbar
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Area covered
    Chicago
    Description

    This dataset has gained popularity over time and is widely known. While Kaggle courses teach how to use Google BigQuery to extract a sample from it, this notebook provides a HOW-TO guide to access the dataset directly within your own notebook. Instead of uploading the entire dataset here, which is quite large, I offer several alternatives to work with a smaller portion of it. My main focus was to demonstrate various techniques to make the dataset more manageable on your own laptop, ensuring smoother operations. Additionally, I've included some interesting insights on basic descriptive statistics and even a modeling example, which can be further explored based on your preferences. I intend to revisit and refine it in the near future to enhance its rigor. Meanwhile, I welcome any suggestions to improve the notebook!

    Here are the columns that I have chosen to include (after carefully eliminating a few others):

    • Date: This column represents the timestamp of the incident. From this column, I have extracted the Month, Day, and Hour information. We can also add additional time-based columns such as Week and Day of the Week, among others.
    • Block: This column provides a partially redacted address where the incident occurred, indicating the same block as the actual address.
    • IUCR: The acronym stands for Illinois Uniform Crime Reporting. This code is directly linked to the Primary Type and Description. You can find more information about it in this link.
    • Primary Type: This column describes the primary category of the IUCR code mentioned above.
    • Description: This column provides a secondary description of the IUCR code, serving as a subcategory of the primary description.
    • Location Description: Here, you can find the description of the location where the incident took place.
    • Arrest: This column indicates whether an arrest was made in relation to the incident.
    • Domestic: It shows whether the incident was domestic-related, as defined by the Illinois Domestic Violence Act.
    • Beat: The beat refers to the smallest police geographic area, with each beat having a dedicated territory. You can find more information about it in this link.
    • District: This column represents the police district where the incident occurred.
    • Ward: It refers to the number that labels the City Council district where the incident took place.
    • Community Areas: This column indicates the community area where the incident occurred. Chicago has a total of 77 community areas.
    • FBI Code: The crime classification outlined in the FBI's National Incident-Based Reporting System (NIBRS).
    • X-Coordinate, Y-Coordinate, Latitude, Longitude, Location: These columns provide information about the geographical coordinates of the incident location, including latitude and longitude. The "Location" column contains just the latitude and longitude coordinates.
    • Year, Updated On: These columns represent the year of the incident and the date on which the dataset was last updated.

    Feel free to explore the notebook and provide any suggestions for improvement. Your feedback is highly appreciated!

  19. Customer Activity

    • kaggle.com
    zip
    Updated Nov 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NW Analytics (2022). Customer Activity [Dataset]. https://www.kaggle.com/datasets/nwanalytics/customer-activity/code
    Explore at:
    zip(72684 bytes)Available download formats
    Dataset updated
    Nov 12, 2022
    Authors
    NW Analytics
    Description

    Context

    Assume you are a data analyst in an EdTech company. The company’s customer success team works with an objective to help customers get the maximum value from their product by doing deeper dives into the customer's needs, wants and expectations from the product and helping them reach their goals.

    The customer success team is aiming to achieve sustainable growth by focusing on retaining the existing users.

    Therefore, your team wants to analyze the activity of your existing users and understand their performance, behaviours, and patterns to gain meaningful insights, that help your customer success team take data-informed decisions.

    Expected Outcome

    1. Brainstorm and identify the right metrics and frame proper questions for analysis. Your analysis should help your customer success team to understand.
      • How is the current retention of the users
      • How are they engaging with the content
      • How efficiently are their discussions being resolved
    2. In case you identify any outliers in the data set, make a note of them and exclude them from your analysis.
    3. Build the best suitable dashboard presenting your insights.

    Your recommendations must be backed by meaningful insights and professional visualizations which will help your customer success team design road maps, strategies, and action items to achieve the goal.

    Tools to use:

    1. Google Data Studio (preferred), Tableau, Power Bi or any other visualization tool
    2. You can use BigQuery SQL if you wish, not mandatory

    Overview of the Dataset

    The dataset contains the basic details of the enrolled users, their learning resource completion percentages, activities on the platform and the structure of learning resources available on the platform

    1.**users_basic_details**: Contains basic details of the enrolled users.

    2.**day_wise_user_activity**: Contains the details of the day-wise learning activity of the users. - A user shall have one entry for a lesson in a day.

    3.**learning_resource_details**: Contains the details of learning resources offered to the enrolled users - Content is stored in a hierarchical structure: Track → Course →Topic → Lesson. A lesson can be a video, practice, exam, etc. - Example: Tech Foundations → Developer Foundations → Topic 1 → lesson 1

    4.**feedback_details**: Contains the feedback details/rating given by the user to a particular lesson. - Feedback rating is given on a scale of 1 to 5, 5 being the highest. - A user can give feedback to the same lesson multiple times.

    5.**discussion_details**: Contains the details of the discussions created by the user for a particular lesson.

    6.**discussion_comment_details**: Contains the details of the comments posted for the discussions created by the user. - Comments may be posted by mentors or users themselves. - The role of mentors is to guide and help the users by resolving the doubts and issues faced by them related to their learning activity. - A discussion can have multiple comments.

    Tables Description

    users_basic_details:

    • user_id: unique id of the user [string]
    • gender: gender of the enrolled user [string]
    • current_city: city of residence of the user [string]
    • batch_start_datetime: start datetime of the batch, for which the user is enrolled [datetime]
    • referral_source: referral channel of the user [string]
    • highest_qualification: highest qualification (education details) of the enrolled user [string]

    day_wise_user_activity:

    • activity_datetime: date and time of learning of the user [datetime]
    • user_id: unique id of the user [string]
    • lesson_id: unique id of the lesson [string]
    • lesson_type: type of the lesson. It can be "SESSION", "PRACTICE", "EXAM" or "PROJECT" [string]
    • day_completion_percentage: percent of the lesson completed by the user on a particular day (out of 100%) [float]
      • The completion percentage is calculated by the formula = learnt duration of a lesson on a day/total duration * 100
    • overall_completion_percentage: overall completion percentage of the lesson till date by the user (out of 100%) [float]

      • Example: If a user, who started a lesson on Jan 1, ’22 completes the lesson by learning it in parts (10%, 35%, 37%, 18% each day) on 4 different days, Then
        • Jan 1, ‘22 - day_completion_percentage - 10%, overall_completion_percentage - 10%
        • Jan 3, ‘22 - day_completion_percentage - 35%, overall_completion_percentage - 45%
        • Jan 4, ‘22 - day_completion_percentage - 37%, overall_completion_percentage - 82%
        • Jan 6, ‘22 - day_completion_percentage - 18%, overall_completion_percentage - 100%

    learning_resource_details:

    • track_id: unique id of the track [string]
    • track_title: name of the track [string]
    • course_id: unique id of the course [string]
    • **`...
  20. Google 2019 Cluster sample

    • kaggle.com
    zip
    Updated Feb 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Derrick Mwiti (2022). Google 2019 Cluster sample [Dataset]. https://www.kaggle.com/datasets/derrickmwiti/google-2019-cluster-sample/data
    Explore at:
    zip(101383815 bytes)Available download formats
    Dataset updated
    Feb 4, 2022
    Authors
    Derrick Mwiti
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    From https://research.google/tools/datasets/google-cluster-workload-traces-2019/

    his is a trace of the workloads running on eight Google Borg compute clusters for the month of May 2019. The trace describes every job submission, scheduling decision, and resource usage data for the jobs that ran in those clusters.

    It builds on the May 2011 trace of one cluster, which has enabled a wide range of research on advancing the state-of-the-art for cluster schedulers and cloud computing, and has been used to generate hundreds of analyses and studies.

    Since 2011, machines and software have evolved, workloads have changed, and the importance of workload variance has become even clearer. The new trace allows researchers to explore these changes. The new dataset includes additional data, including:

    CPU usage information histograms for each 5 minute period, not just a point sample; information about alloc sets (shared resource reservations used by jobs); and job-parent information for master/worker relationships such as MapReduce jobs. Just like the last trace, these new ones focus on resource requests and usage, and contain no information about end users, their data, or access patterns to storage systems and other services.

    The trace data is being made available via Google BigQuery so that sophisticated analyses can be performed without requiring local resources. This site provides access instructions and a detailed description of what the traces contain.

    https://drive.google.com/file/d/10r6cnJ5cJ89fPWCgj7j4LtLBqYN9RiI9/view

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google BigQuery (2018). BigQuery Sample Tables [Dataset]. https://www.kaggle.com/bigquery/samples
Organization logoOrganization logo

BigQuery Sample Tables

Sample Tables for Tutorials and Learning (BigQuery)

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 4, 2018
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

BigQuery provides a limited number of sample tables that you can run queries against. These tables are suited for testing queries and learning BigQuery.

Content

  • gsod: Contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010.

  • github_nested: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a nested schema. Created in September 2012.

  • github_timeline: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a flat schema. Created in May 2012.

  • natality: Describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008.

  • shakespeare: Contains a word index of the works of Shakespeare, giving the number of times each word appears in each corpus.

  • trigrams: Contains English language trigrams from a sample of works published between 1520 and 2008.

  • wikipedia: Contains the complete revision history for all Wikipedia articles up to April 2010.

Fork this kernel to get started.

Acknowledgements

Data Source: https://cloud.google.com/bigquery/sample-tables

Banner Photo by Mervyn Chan from Unplash.

Inspiration

How many babies were born in New York City on Christmas Day?

How many words are in the play Hamlet?

Search
Clear search
Close search
Google apps
Main menu