100+ datasets found
  1. Google Ads Transparency Center

    • console.cloud.google.com
    Updated Apr 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de&inv=1&invt=Ab2chA (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
    Explore at:
    Dataset updated
    Apr 11, 2023
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  2. Google Patents Public Data

    • kaggle.com
    zip
    Updated Sep 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/datasets/bigquery/patents
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2018
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

    Content

    The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

    For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

    “Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

    Banner photo by Helloquence on Unsplash

  3. Data from: Stack Overflow

    • console.cloud.google.com
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&inv=1&invt=Ab1KXg (2020). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow
    Explore at:
    Dataset updated
    Mar 4, 2020
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  4. SEC Public Dataset

    • console.cloud.google.com
    Updated Aug 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=zh-TW&inv=1&invt=Ab2c-Q (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=zh-TW
    Explore at:
    Dataset updated
    Aug 18, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.瞭解詳情

  5. theLook eCommerce

    • console.cloud.google.com
    Updated Nov 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&inv=1&invt=Ab2Y8Q (2022). theLook eCommerce [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce
    Explore at:
    Dataset updated
    Nov 28, 2022
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.What is BigQuery .

  6. NYC Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/nycopendata/new-york
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    NYC Open Data
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

    Content

    Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

    • Over 8 million 311 service requests from 2012-2016

    • More than 1 million motor vehicle collisions 2012-present

    • Citi Bike stations and 30 million Citi Bike trips 2013-present

    • Over 1 billion Yellow and Green Taxi rides from 2009-present

    • Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://opendata.cityofnewyork.us/

    https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

    The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

    Banner Photo by @bicadmedia from Unplash.

    Inspiration

    On which New York City streets are you most likely to find a loud party?

    Can you find the Virginia Pines in New York City?

    Where was the only collision caused by an animal that injured a cyclist?

    What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

    https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png

  7. San Francisco Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasf/san-francisco
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    DataSF
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    San Francisco
    Description

    Context

    DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

    https://datasf.org/about/

    Content

    This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

    • This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.
    • This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.
    • This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).
    • This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    http://datasf.org/

    Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @meric from Unplash.

    Inspiration

    Which neighborhoods have the highest proportion of offensive graffiti?

    Which complaint is most likely to be made using Twitter and in which neighborhood?

    What are the most complained about Muni stops in San Francisco?

    What are the top 10 incident types that the San Francisco Fire Department responds to?

    How many medical incidents and structure fires are there in each neighborhood?

    What’s the average response time for each type of dispatched vehicle?

    Which category of police incidents have historically been the most common in San Francisco?

    What were the most common police incidents in the category of LARCENY/THEFT in 2016?

    Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

    What is the average tree diameter?

    What is the highest number of a particular species of tree planted in a single year?

    Which San Francisco locations feature the largest number of trees?

  8. geo-openstreetmap

    • kaggle.com
    zip
    Updated Apr 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). geo-openstreetmap [Dataset]. https://www.kaggle.com/bigquery/geo-openstreetmap
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 17, 2020
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    Adapted from Wikipedia: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Created in 2004, it was inspired by the success of Wikipedia and more than two million registered users who can add data by manual survey, GPS devices, aerial photography, and other free sources.

    To aid researchers, data scientists, and analysts in the effort to combat COVID-19, Google is making a hosted repository of public datasets including OpenStreetMap data, free to access. To facilitate the Kaggle community to access the BigQuery dataset, it is onboarded to Kaggle platform which allows querying it without a linked GCP account. Please note that due to the large size of the dataset, Kaggle applies a quota of 5 TB of data scanned per user per 30-days.

    Content

    This is the OpenStreetMap (OSM) planet-wide dataset loaded to BigQuery.

    Tables: - history_* tables: full history of OSM objects. - planet_* tables: snapshot of current OSM objects as of Nov 2019.

    The history_* and planet_* table groups are composed of node, way, relation, and changeset tables. These contain the primary OSM data types and an additional changeset corresponding to OSM edits for convenient access. These objects are encoded using the BigQuery GEOGRAPHY data type so that they can be operated upon with the built-in geography functions to perform geometry and feature selection, additional processing.

    Resources

    You can read more about OSM elements on the OSM Wiki. This dataset uses BigQuery GEOGRAPHY datatype which supports a set of functions that can be used to analyze geographical data, determine spatial relationships between geographical features, and construct or manipulate GEOGRAPHYs.

  9. Data from: Hacker News

    • console.cloud.google.com
    Updated Jul 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Y%20Combinator&inv=1&invt=Ab2YeQ (2018). Hacker News [Dataset]. https://console.cloud.google.com/marketplace/product/y-combinator/hacker-news
    Explore at:
    Dataset updated
    Jul 21, 2018
    Dataset provided by
    Googlehttp://google.com/
    Description

    This dataset contains all stories and comments from Hacker News from its launch in 2006 to present. Each story contains a story ID, the author that made the post, when it was written, and the number of points the story received. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  10. USA Name Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

    Content

    This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

    All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

    https://cloud.google.com/bigquery/public-data/usa-names

    Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @dcp from Unplash.

    Inspiration

    What are the most common names?

    What are the most common female names?

    Are there more female or male names?

    Female names by a wide margin?

  11. The Met Public Domain Art Works

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Metropolitan Museum of Art (2019). The Met Public Domain Art Works [Dataset]. https://www.kaggle.com/datasets/metmuseum/the-met
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    The Metropolitan Museum of Art
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Metropolitan Museum of Art, better known as the Met, provides a public domain dataset with over 200,000 objects including metadata and images. In early 2017, the Met debuted their Open Access policy to make part of their collection freely available for unrestricted use under the Creative Commons Zero designation and their own terms and conditions.

    Content

    This dataset provides a new view to one of the world’s premier collections of fine art. The data includes both image in Google Cloud Storage, and associated structured data in two BigQuery two tables, objects and images (1:N). Locations to images on both The Met’s website and in Google Cloud Storage are available in the BigQuery table.

    Fork this kernel to get started with this dataset.

    https://cloud.google.com/blog/big-data/2017/08/images/150177792553261/met03.png" alt=""> https://cloud.google.com/blog/big-data/2017/08/images/150177792553261/met03.png

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:the_met

    https://console.cloud.google.com/launcher/details/the-metropolitan-museum-of-art/the-met-public-domain-art-works

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.metmuseum.org/about-the-met/policies-and-documents/image-resources — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @danieltong from Unplash.

    Inspiration

    What are the types of art by department?

    What are the earliest photographs in the collection?

    What was the most prolific period for ancient Egyptian Art?

  12. USPTO Patent Examiner Data System (PEDS) Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). USPTO Patent Examiner Data System (PEDS) Data [Dataset]. https://www.kaggle.com/bigquery/uspto-peds
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Patent Examination Data System gives users access to multiple records of USPTO patent application or patent filing status at no cost. PEDS is updated daily and mirrors the data available in the Patent Application Location and Monitoring system (PALM). PEDS provides access to public applications including: published patent applications and patents. PCT applications that have not been published by WIPO. Any applications that have not been released by the USPTO will not be available in PEDS.

    Content

    USPTO Patent Examiner Data System (PEDS) API Data contains data from the examination process of USPTO patent applications. PEDS contains the bibliographic, published document and patent term extension data tabs in Public PAIR from 1981 to present. There is also some data dating back to 1935.

    Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

    Acknowledgements

    "Patent Examination Data System" by the USPTO, for public use.

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_peds

    Banner photo by Thought Catalog on Unsplash

  13. c

    ckanext-datastore-bigquery - Extensions - CKAN Ecosystem Catalog

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-datastore-bigquery - Extensions - CKAN Ecosystem Catalog [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-datastore-bigquery
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The datastore-bigquery extension for CKAN allows users to leverage Google Cloud BigQuery for datastore search and SQL queries, providing an alternative to CKAN's standard datastore. By integrating with BigQuery, this extension aims to enhance performance and scalability for data-intensive operations against data stored as BigQuery tables. This plugin allows CKAN to query data that actually resides in Google BigQuery. Key Features: BigQuery Integration: Enables CKAN's datastore search and datastore SQL API to query data directly from Google BigQuery tables. Alternative to Standard Datastore: Offers BigQuery as a backend option, providing users with flexibility in choosing their data storage and query engine. Credential-Based Authentication: Relies on Google Cloud credentials (JSON file) for secure authentication and authorization to BigQuery resources. Test Suite Comes with a test suite that can be can be run as a standalone instance via pytest or also run as an integrated CKAN plugin via nosetests. Technical Integration: The extension integrates into CKAN as a plugin. You will need to enable it in the .ini configuration file. The extension uses Google Cloud credentials to authenticate and authorize access to BigQuery, enabling seamless data access and querying within the CKAN environment. Benefits & Impact: This extension is valuable for CKAN deployments dealing with big datasets hosted in BigQuery, offering potentially significant performance and scalability benefits compared to CKAN's default datastore implementation. The ability to use BigQuery as the data backend removes dependency / limitations on the CKAN datastore.

  14. Human Variant Annotation Datasets

    • console.cloud.google.com
    Updated Jul 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&inv=1&invt=Ab3i5A (2022). Human Variant Annotation Datasets [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/human-variant-annotation-public
    Explore at:
    Dataset updated
    Jul 16, 2022
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    These datasets are important to genomics researchers because they characterize several aspects of what the scientific community has learned to date about human sequence variants. Making this human annotation data freely available in GCP will enable researchers to focus less on data movement and management tasks associated with procuring this data and instead make immediate use of the data to better understand the clinical relevance of particular variant such as disease causing or protective variants (ClinVar), search a catalog of SNPs that have been identified in the human genome (dbSNP), and discover how frequently a particular variant occurs across the human population (1000Genomes, ESP, ExAC, gnomAD) This human annotation dataset contains both a mirror of the original Variant Call Files (VCF) files from NCBI, NHLBI Exome Sequencing Project (ESP) and ensembl as Google Cloud Storage (GCS) objects. In addition, these human sequence variants have also been translated into a particular variant table format and made available in Google BigQuery giving researchers the ability to use cloud technology and code repositories such as the Verily Life Sciences Annotation Toolkit to perform analyses in parallel. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This public dataset is hosted in Google Cloud Storage and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.

  15. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  16. 1000 Cannabis Genomes Project

    • kaggle.com
    zip
    Updated Feb 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). 1000 Cannabis Genomes Project [Dataset]. https://www.kaggle.com/bigquery/genomics-cannabis
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 26, 2019
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Cannabis is a genus of flowering plants in the family Cannabaceae.

    Source: https://en.wikipedia.org/wiki/Cannabis

    Content

    In October 2016, Phylos Bioscience released a genomic open dataset of approximately 850 strains of Cannabis via the Open Cannabis Project. In combination with other genomics datasets made available by Courtagen Life Sciences, Michigan State University, NCBI, Sunrise Medicinal, University of Calgary, University of Toronto, and Yunnan Academy of Agricultural Sciences, the total amount of publicly available data exceeds 1,000 samples taken from nearly as many unique strains.

    https://medium.com/google-cloud/dna-sequencing-of-1000-cannabis-strains-publicly-available-in-google-bigquery-a33430d63998

    These data were retrieved from the National Center for Biotechnology Information’s Sequence Read Archive (NCBI SRA), processed using the BWA aligner and FreeBayes variant caller, indexed with the Google Genomics API, and exported to BigQuery for analysis. Data are available directly from Google Cloud Storage at gs://gcs-public-data--genomics/cannabis, as well as via the Google Genomics API as dataset ID 918853309083001239, and an additional duplicated subset of only transcriptome data as dataset ID 94241232795910911, as well as in the BigQuery dataset bigquery-public-data:genomics_cannabis.

    All tables in the Cannabis Genomes Project dataset have a suffix like _201703. The suffix is referred to as [BUILD_DATE] in the descriptions below. The dataset is updated frequently as new releases become available.

    The following tables are included in the Cannabis Genomes Project dataset:

    Sample_info contains fields extracted for each SRA sample, including the SRA sample ID and other data that give indications about the type of sample. Sample types include: strain, library prep methods, and sequencing technology. See SRP008673 for an example of upstream sample data. SRP008673 is the University of Toronto sequencing of Cannabis Sativa subspecies Purple Kush.

    MNPR01_reference_[BUILD_DATE] contains reference sequence names and lengths for the draft assembly of Cannabis Sativa subspecies Cannatonic produced by Phylos Bioscience. This table contains contig identifiers and their lengths.

    MNPR01_[BUILD_DATE] contains variant calls for all included samples and types (genomic, transcriptomic) aligned to the MNPR01_reference_[BUILD_DATE] table. Samples can be found in the sample_info table. The MNPR01_[BUILD_DATE] table is exported using the Google Genomics BigQuery variants schema. This table is useful for general analysis of the Cannabis genome.

    MNPR01_transcriptome_[BUILD_DATE] is similar to the MNPR01_[BUILD_DATE] table, but it includes only the subset transcriptomic samples. This table is useful for transcribed gene-level analysis of the Cannabis genome.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    Dataset Source: http://opencannabisproject.org/ Category: Genomics Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://www.ncbi.nlm.nih.gov/home/about/policies.shtml - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. Update frequency: As additional data are released to GenBank View in BigQuery: https://bigquery.cloud.google.com/dataset/bigquery-public-data:genomics_cannabis View in Google Cloud Storage: gs://gcs-public-data--genomics/cannabis

    Banner Photo by Rick Proctor from Unplash.

    Inspiration

    Which Cannabis samples are included in the variants table?

    Which contigs in the MNPR01_reference_[BUILD_DATE] table have the highest density of variants?

    How many variants does each sample have at the THC Synthase gene (THCA1) locus?

  17. Ethereum Classic Cryptocurrency Dataset - Dataset - CryptoData Hub

    • cryptodata.center
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cryptodata.center (2024). Ethereum Classic Cryptocurrency Dataset - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/ethereum-classic-cryptocurrency-dataset
    Explore at:
    Dataset updated
    Dec 4, 2024
    Dataset provided by
    CryptoDATA
    Description

    Ethereum Classic is a cryptocurrency with shared history with the Ethereum cryptocurrency. On technical merits, the two cryptocurrencies are nearly identical, differing only in programming language features supported by the Ethereum Virtual machine which is used to write smart contracts. This dataset contains the blockchain data in their entirety, pre-processed to be human-friendly and to support common use cases such as auditing, investigating, and researching the economic and financial properties of the system. Interested in learning more about how Cloud Public Data is working to make data from blockchains and cryptocurrencies more accessible? Check out our blog post on the Google Cloud Big Data Blog and try the sample query below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  18. PatentsView Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). PatentsView Data [Dataset]. https://www.kaggle.com/bigquery/patentsview
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    The USPTO grants US patents to inventors and assignees all over the world. For researchers in particular, PatentsView is intended to encourage the study and understanding of the intellectual property (IP) and innovation system; to serve as a fundamental function of the government in creating “public good” platforms in these data; and to eliminate redundant cleaning, converting and matching of these data by individual researchers, thus freeing up researcher time to do what they do best—study IP, innovation, and technological change.

    Content

    PatentsView Data is a database that longitudinally links inventors, their organizations, locations, and overall patenting activity. The dataset uses data derived from USPTO bulk data files.

    Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

    Acknowledgements

    “PatentsView” by the USPTO, US Department of Agriculture (USDA), the Center for the Science of Science and Innovation Policy, New York University, the University of California at Berkeley, Twin Arch Technologies, and Periscopic, used under CC BY 4.0.

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patentsview

    Banner photo by rawpixel on Unsplash

  19. Dogecoin Crypto Blockchain

    • kaggle.com
    zip
    Updated Feb 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Dogecoin Crypto Blockchain [Dataset]. https://www.kaggle.com/bigquery/crypto-dogecoin
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 14, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Dogecoin is an open source peer-to-peer digital currency, favored by Shiba Inus worldwide. It is qualitatively more fun while being technically nearly identical to its close relative Bitcoin. This dataset contains the blockchain data in their entirety, pre-processed to be human-friendly and to support common use cases such as auditing, investigating, and researching the economic and financial properties of the system.

    Content

    You can access the data from BigQuery in your notebook with bigquery-public-data.crypto_dogecoin dataset.

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_dogecoin.[TABLENAME].

    Acknowledgements

    This dataset wouldn't be possible without the help of BigQuery and all of their contributions to public data.

  20. USPTO OCE Patent Assignment Dataset

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). USPTO OCE Patent Assignment Dataset [Dataset]. https://www.kaggle.com/bigquery/uspto-oce-assignment
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    The Office of the Chief Economist (OCE) is responsible for advising the Under Secretary of Commerce for Intellectual Property and Director of the USPTO on the economic implications of policies and programs affecting the U.S. intellectual property (IP) system. The office disseminates detailed patent and trademark data, undertakes research, and conducts economic analysis on a variety of IP issues. OCE works with policy makers, collaborates with academics, and engages the public more generally through conferences it organizes, the publicly accessible research datasets it provides, and its publications.

    Content

    The USPTO OCE Patent Assignment Dataset contains detailed data patent assignments and other transactions recorded at the USPTO since 1970.

    Acknowledgements

    "USPTO OCE Patent Assignment Data" by the USPTO, for public use. Marco, Alan C., Graham, Stuart J.H., Myers, Amanda F., D'Agostino, Paul A and Apple, Kirsten, "The USPTO Patent Assignment Dataset: Descriptions and Analysis" (July 27, 2015).

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_assignment

    Banner photo by Jeff Sheldon on Unsplash

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de&inv=1&invt=Ab2chA (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
Organization logoOrganization logo

Google Ads Transparency Center

Explore at:
11 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 11, 2023
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description

This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Search
Clear search
Close search
Google apps
Main menu