94 datasets found

Google Trends
console.cloud.google.com
Updated May 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ja (2022). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=ja
Explore at:
Dataset updated
May 10, 2022
Dataset provided by
Google Searchhttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Description
The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Looker Ecommerce BigQuery Dataset
kaggle.com
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mustafa Keser
Description
Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. distribution_centers.csv

Columns:

id: Unique identifier for each distribution center.

name: Name of the distribution center.

latitude: Latitude coordinate of the distribution center.

longitude: Longitude coordinate of the distribution center.

2. events.csv

Columns:

id: Unique identifier for each event.

user_id: Identifier for the user associated with the event.

sequence_number: Sequence number of the event.

session_id: Identifier for the session during which the event occurred.

created_at: Timestamp indicating when the event took place.

ip_address: IP address from which the event originated.

city: City where the event occurred.

state: State where the event occurred.

postal_code: Postal code of the event location.

browser: Web browser used during the event.

traffic_source: Source of the traffic leading to the event.

uri: Uniform Resource Identifier associated with the event.

event_type: Type of event recorded.

3. inventory_items.csv

Columns:

id: Unique identifier for each inventory item.

product_id: Identifier for the associated product.

created_at: Timestamp indicating when the inventory item was created.

sold_at: Timestamp indicating when the item was sold.

cost: Cost of the inventory item.

product_category: Category of the associated product.

product_name: Name of the associated product.

product_brand: Brand of the associated product.

product_retail_price: Retail price of the associated product.

product_department: Department to which the product belongs.

product_sku: Stock Keeping Unit (SKU) of the product.

product_distribution_center_id: Identifier for the distribution center associated with the product.

4. order_items.csv

Columns:

id: Unique identifier for each order item.

order_id: Identifier for the associated order.

user_id: Identifier for the user who placed the order.

product_id: Identifier for the associated product.

inventory_item_id: Identifier for the associated inventory item.

status: Status of the order item.

created_at: Timestamp indicating when the order item was created.

shipped_at: Timestamp indicating when the order item was shipped.

delivered_at: Timestamp indicating when the order item was delivered.

returned_at: Timestamp indicating when the order item was returned.

5. orders.csv

Columns:

order_id: Unique identifier for each order.

user_id: Identifier for the user who placed the order.

status: Status of the order.

gender: Gender information of the user.

created_at: Timestamp indicating when the order was created.

returned_at: Timestamp indicating when the order was returned.

shipped_at: Timestamp indicating when the order was shipped.

delivered_at: Timestamp indicating when the order was delivered.

num_of_item: Number of items in the order.

6. products.csv

Columns:

id: Unique identifier for each product.

cost: Cost of the product.

category: Category to which the product belongs.

name: Name of the product.

brand: Brand of the product.

retail_price: Retail price of the product.

department: Department to which the product belongs.

sku: Stock Keeping Unit (SKU) of the product.

distribution_center_id: Identifier for the distribution center associated with the product.

7. users.csv

Columns:

id: Unique identifier for each user.

first_name: First name of the user.

last_name: Last name of the user.

email: Email address of the user.

age: Age of the user.

gender: Gender of the user.

state: State where t...
theLook eCommerce
console.cloud.google.com
Updated Jun 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=ja (2022). theLook eCommerce [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce?hl=ja
Explore at:
Dataset updated
Jun 11, 2022
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Description
TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.What is BigQuery .
Libraries.io Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Libraries.io (2019). Libraries.io Data [Dataset]. https://www.kaggle.com/librariesdotio/libraries-io
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset provided by
Libraries.iohttps://libraries.io/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

In this release you will find data about software distributed and/or crafted publicly on the Internet. You will find information about its development, its distribution and its relationship with other software included as a dependency. You will not find any information about the individuals who create and maintain these projects.

Content

Libraries.io gathers data on open source software from 33 package managers and 3 source code repositories. We track over 2.4m unique open source projects, 25m repositories and 121m interdependencies between them. This gives Libraries.io a unique understanding of open source software.

https://libraries.io/data

Fork this kernel to get started with this dataset.

Acknowledgements

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — https://libraries.io/data — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

https://libraries.io/data

https://bigquery.cloud.google.com/dataset/bigquery-public-data:libraries_io?_ga=2.42277601.-577194880.1523455401

https://console.cloud.google.com/marketplace/details/libraries-io/librariesio

Banner Photo by Caspar Rubin from Unplash.

Inspiration

What are the repositories, avg project size, and avg # of stars?

What are the top dependencies per platform?

What are the top unmaintained or deprecated projects?
International Education
console.cloud.google.com
Updated Jun 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:The%20World%20Bank&hl=en-GB (2022). International Education [Dataset]. https://console.cloud.google.com/marketplace/product/the-world-bank/education?hl=en-GB
Explore at:
Dataset updated
Jun 20, 2022
Dataset provided by
Googlehttp://google.com/
Description
This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Data from: Stack Overflow
console.cloud.google.com
Updated Sep 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&hl=fr (2022). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/details/stack-exchange/stack-overflow?hl=fr
Explore at:
Dataset updated
Sep 19, 2022
Dataset provided by
Googlehttp://google.com/
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Day & night temperatures, 50yrs, 1666ws, TFRecord
kaggle.com
zip
Updated Nov 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Görner (2019). Day & night temperatures, 50yrs, 1666ws, TFRecord [Dataset]. https://www.kaggle.com/datasets/mgorner/day-night-temperatures-50yrs-1666ws-tfrecord
Explore at:
zip(160157825 bytes)Available download formats
Dataset updated
Nov 9, 2019
Authors
Martin Görner
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
This dataset is a cleaned-up extract from the following public BigQuery dataset: https://console.cloud.google.com/marketplace/details/noaa-public/ghcn-d

The dataset contains daily min/max temperatures from a selection of 1666 weather stations. The data spans exactly 50 years. Missing values have been interpolated and are marked as such.

This dataset is in TFRecord format.

About the original dataset: NOAA’s Global Historical Climatology Network (GHCN) is an integrated database of climate summaries from land surface stations across the globe that have been subjected to a common suite of quality assurance reviews. The data are obtained from more than 20 sources. The GHCN-Daily is an integrated database of daily climate summaries from land surface stations across the globe, and is comprised of daily climate records from over 100,000 stations in 180 countries and territories, and includes some data from every year since 1763.
census-bureau-international
kaggle.com
zip
Updated May 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). census-bureau-international [Dataset]. https://www.kaggle.com/datasets/bigquery/census-bureau-international
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 6, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.

Sample Query 1

What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!

standardSQL

SELECT age.country_name, age.life_expectancy, size.country_area FROM ( SELECT country_name, life_expectancy FROM bigquery-public-data.census_bureau_international.mortality_life_expectancy WHERE year = 2016) age INNER JOIN ( SELECT country_name, country_area FROM bigquery-public-data.census_bureau_international.country_names_area where country_area > 25000) size ON age.country_name = size.country_name ORDER BY 2 DESC /* Limit removed for Data Studio Visualization */ LIMIT 10

Sample Query 2

Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.

standardSQL

SELECT age.country_name, SUM(age.population) AS under_25, pop.midyear_population AS total, ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25 FROM ( SELECT country_name, population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population_agespecific WHERE year =2017 AND age < 25) age INNER JOIN ( SELECT midyear_population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population WHERE year = 2017) pop ON age.country_code = pop.country_code GROUP BY 1, 3 ORDER BY 4 DESC /* Remove limit for visualization*/ LIMIT 10

Sample Query 3

The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.

SELECT growth.country_name, growth.net_migration, CAST(area.country_area AS INT64) AS country_area FROM ( SELECT country_name, net_migration, country_code FROM bigquery-public-data.census_bureau_international.birth_death_growth_rates WHERE year = 2017) growth INNER JOIN ( SELECT country_area, country_code FROM bigquery-public-data.census_bureau_international.country_names_area

Update frequency

Historic (none)

Dataset source

United States Census Bureau

Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
Google Community Mobility Reports
console.cloud.google.com
Updated May 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&inv=1&invt=Ab48sA (2020). Google Community Mobility Reports [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19_google_mobility
Explore at:
Dataset updated
May 2, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Description
UPDATE: The Community Mobility Reports are no longer being updated as of October 15, 2022. All historical data will remain publicly available for research purposes. This dataset aims to provide insights into what has changed in response to policies aimed at combating COVID-19. It reports movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. This dataset is intended to help remediate the impact of COVID-19. It shouldn’t be used for medical diagnostic, prognostic, or treatment purposes. It also isn’t intended to be used for guidance on personal travel plans. To learn more about the dataset, the place categories and how we calculate these trends and preserve privacy, visit our help center or read the data documentation All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
The Met Public Domain Art Works
console.cloud.google.com
Updated Sep 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:The%20Met&hl=it (2022). The Met Public Domain Art Works [Dataset]. https://console.cloud.google.com/marketplace/product/the-metropolitan-museum-of-art/the-met-public-domain-art-works?hl=it
Explore at:
Dataset updated
Sep 20, 2022
Dataset provided by
Googlehttp://google.com/
Description
The Metropolitan Museum of Art, better known as the Met, provides a public domain dataset with over 200,000 objects including metadata and images. In early 2017, the Met debuted their Open Access policy to make part of their collection freely available for unrestricted use under the Creative Commons Zero designation and their own terms and conditions. This dataset provides a new view to one of the world’s premier collections of fine art. The data includes both image in Google Cloud Storage, and associated structured data in two BigQuery two tables, objects and images (1:N). Locations to images on both The Met’s website and in Google Cloud Storage are available in the BigQuery table. The meta data for this public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . The image data for this public dataset is hosted in Google Cloud Storage and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.
NPPES Plan and Provider Enumeration System
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2019). NPPES Plan and Provider Enumeration System [Dataset]. https://www.kaggle.com/cms/nppes
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
Centers for Medicare & Medicaid Services
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The CMS National Plan and Provider Enumeration System (NPPES) was developed as part of the Administrative Simplification provisions in the original HIPAA act. The primary purpose of NPPES was to develop a unique identifier for each physician that billed medicare and medicaid. This identifier is now known as the National Provider Identifier Standard (NPI) which is a required 10 digit number that is unique to an individual provider at the national level.

Once an NPI record is assigned to a healthcare provider, parts of the NPI record that have public relevance, including the provider’s name, speciality, and practice address are published in a searchable website as well as downloadable file of zipped data containing all of the FOIA disclosable health care provider data in NPPES and a separate PDF file of code values which documents and lists the descriptions for all of the codes found in the data file.

Content

The dataset contains the latest NPI downloadable file in an easy to query BigQuery table, npi_raw. In addition, there is a second table, npi_optimized which harnesses the power of Big Query’s next-generation columnar storage format to provide an analytical view of the NPI data containing description fields for the codes based on the mappings in Data Dissemination Public File - Code Values documentation as well as external lookups to the healthcare provider taxonomy codes . While this generates hundreds of columns, BigQuery makes it possible to process all this data effectively and have a convenient single lookup table for all provider information.

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:nppes?_ga=2.117120578.-577194880.1523455401

https://console.cloud.google.com/marketplace/details/hhs/nppes?filter=category:science-research

Dataset Source: Center for Medicare and Medicaid Services. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @rawpixel from Unplash.

Inspiration

What are the top ten most common types of physicians in Mountain View?

What are the names and phone numbers of dentists in California who studied public health?
G_political_ads
kaggle.com
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SOWPARNIKA M (2024). G_political_ads [Dataset]. https://www.kaggle.com/datasets/sowparnikam/g-political-ads
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 24, 2024
Dataset provided by
Kaggle
Authors
SOWPARNIKA M
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains information on how much money is spent by verified advertisers on political advertising across Google Ad Services. In addition, insights on demographic targeting used in political ad campaigns by these advertisers are also provided. Finally, links to the actual political ad in the Google Transparency Report (https://adstransparency.google.com) are provided. Data for an election expires 7 years after the election. After this point, the data are removed from the dataset and are no longer available.

Update frequency: Daily

Dataset source: Transparency Report: Political Advertising on Google

Terms of use:

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/transparency-report/google-political-ads

For more information see: The Political Advertising on Google Transparency Report at https://adstransparency.google.com

The supporting Frequently Asked Questions at https://support.google.com/transparencyreport/answer/9575640?hl=en&ref_topic=7295796
Forest Inventory Analysis
console.cloud.google.com
Updated Jan 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:US%20Forest%20Service&hl=it (2023). Forest Inventory Analysis [Dataset]. https://console.cloud.google.com/marketplace/product/us-forest-service/forest-inventory-analysis?hl=it
Explore at:
Dataset updated
Jan 7, 2023
Dataset provided by
Googlehttp://google.com/
Description
The Forest Inventory and Analysis dataset is a nationwide survey of the forest assets of the United States. The Forest Inventory and Analysis (FIA) research program has been in existence since mandated by Congress in 1928. FIA's primary objective is to determine the extent, condition, volume, growth, and use of trees on the Nation's forest land. This dataset includes the most recent data available from the USFS datamart , it does not include historical data. Original field names have been expanded to full names and code values have been expanded to full names in all tables, in addition, each table contains data from all States. A full description of the original tables is available from the USFS . A user's guide with example summary reports is also available from the USFS . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
COKI Language Dataset
zenodo.org
application/gzip, csv
Updated Jun 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon (2022). COKI Language Dataset [Dataset]. http://doi.org/10.5281/zenodo.6636625
Explore at:
application/gzip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6636625
Dataset updated
Jun 16, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The COKI Language Dataset contains predictions for 122 million academic publications. The dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

Methodology
A subset of the COKI Academic Observatory Dataset, which is produced by the Academic Observatory Workflows codebase [1], was extracted and converted to CSV with Bigquery and downloaded to a virtual machine. The subset consists of all publications with DOIs in our dataset, including each publication’s title and abstract from both Crossref Metadata and Microsoft Academic Graph. The CSV files were then processed with a Python script. The titles and abstracts for each record were pre-processed, concatenated together and analysed with fastText. The titles and abstracts from Crossref Metadata were used first, with the MAG titles and abstracts serving as a fallback when the Crossref Metadata information was empty. Language was predicted for each publication using the fastText lid.176.bin language identification model [2]. fastText was chosen because of its high accuracy and fast runtime speed [3]. The final output dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

Query or Download
The data is publicly accessible in BigQuery in the following two tables:

coki-data-share.language.doi_language

coki-data-share.language.iso_language

When you make queries on these tables, make sure that you are in your own Google Cloud project, otherwise the queries will fail.

See the COKI Language Detection README for instructions on how to download the data from Zenodo and load it into BigQuery.

Code
The code that generated this dataset, the BigQuery schemas and instructions for loading the data into BigQuery can be found here: https://github.com/The-Academic-Observatory/coki-language

License
COKI Language Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

Attributions
This work contains information from:

Microsoft Academic Graph which is made available under the ODC Attribution Licence.

Crossref Metadata via the Metadata Plus program. Bibliographic metadata is made available without copyright restriction and Crossref generated data under a CC0 licence. See metadata licence information for more details.

References
[1] https://doi.org/10.5281/zenodo.6366695
[2] https://fasttext.cc/docs/en/language-identification.html
[3] https://modelpredict.com/language-identification-survey
census-bureau-usa
kaggle.com
zip
Updated May 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). census-bureau-usa [Dataset]. https://www.kaggle.com/datasets/bigquery/census-bureau-usa
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset authored and provided by
Google BigQuery
Area covered
United States
Description
Context :

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. Update frequency: Historic (none)

Dataset source

United States Census Bureau

Sample Query

SELECT zipcode, population FROM bigquery-public-data.census_bureau_usa.population_by_zip_2010 WHERE gender = '' ORDER BY population DESC LIMIT 10

Terms of use

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/us-census-data
eumetsat-rss
huggingface.co
Updated May 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open Climate Fix (Archives) (2023). eumetsat-rss [Dataset]. http://doi.org/10.57967/hf/1488
Explore at:
Unique identifier
https://doi.org/10.57967/hf/1488
Dataset updated
May 26, 2023
Dataset provided by
Open Climate Fix Limited
Authors
Open Climate Fix (Archives)
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
NOTE: All this data, plus a lot more, is now accessible at https://console.cloud.google.com/marketplace/product/bigquery-public-data/eumetsat-seviri-rss-hrv-uk?project=tactile-acrobat-249716 That dataset is the preferred way to access this data, as it goes back to the beginning of the RSS archive (2008-2023) and is updated on a roughly weekly basis. This dataset consists of the EUMETSAT Rapid Scan Service (RSS) imagery for 2014 to Feb 2023. This data has 2 formats, the High Resolution Visible… See the full description on the dataset page: https://huggingface.co/datasets/openclimatefix/eumetsat-rss.
Data from: Hacker News
console.cloud.google.com
Updated Jul 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Y%20Combinator&hl=de (2023). Hacker News [Dataset]. https://console.cloud.google.com/marketplace/product/y-combinator/hacker-news?hl=de
Explore at:
Dataset updated
Jul 17, 2023
Dataset provided by
Googlehttp://google.com/
Description
This dataset contains all stories and comments from Hacker News from its launch in 2006 to present. Each story contains a story ID, the author that made the post, when it was written, and the number of points the story received. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Chicago Narcotics Crime Jan 2016 - Jul 2020
kaggle.com
Updated Aug 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anugerah Erlaut (2020). Chicago Narcotics Crime Jan 2016 - Jul 2020 [Dataset]. https://www.kaggle.com/aerlaut/chicago-narcotics-jan-2016-jul-2020/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2020
Dataset provided by
Kaggle
Authors
Anugerah Erlaut
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Area covered
Chicago
Description
Introduction

Chicago is one of America's most iconic cities. It has a colorful history, which rich histories such. Recently, Chicago was also a setting for one of Netflix's popular series : Ozark. The story has it that Chicago is the center for drug distribution for the Navarro cartel.

So, how true is the series? A quick search on the internet reveals a recently released DEA report on the. The report shows that drug crime exists in Chicago, although they are distributed by the Cartel de Jalisco Nueva Generacion, the Sinaloa Cartel and the Guerros Unidos, to name a few.

Content

The government of the City of Chicago has provided a publicly available crime database accessible via Google BigQuery. I have downloaded a subset of the data with crime_type narcotics and year > 2015. The data contains records between 1 Jan 2016 UTC until 23 Jul 2020 UTC.

The dataset contains these columns : - case_number : ID of the record - date : Date of incident - iucr : Category of the crime, per Illinois Unified Crime Reporting (IUCR) code. [more](https://data.cityofchicago.org/widgets/c7ck-438e) -description: More detailed description of the crime -location_description: Location of the crime -arrest: Whether an arrest was made -domestic: Was the crime domestic? -district: Which district code where the crime happened. [more](https://data.cityofchicago.org/Public-Safety/Boundaries-Police-Districts-current-/fthy-xz3r) -ward: The ward code where the crime happened. [more](https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Wards-2015-/sp34-6z76) -community_area` : The community area code where the crime happened. more

Acknowledgements

The data is owned and kindly provided by the City of Chicago.

Inspiration

Some questions to get you started:

Is there a trend? Is the crime increasing? or decreasing?

Is there seasonality? Are dealers more like to be out and about in summer? Do they deal inside in winter?

Are some activities more like to happen at certain locations?

We tend to think that more deals happen at night, especially as people wind down, and the surroundings get dark. Does the data reflect that?

Are the incidents clustered to a certain district? Certain type of location?

Lastly, if you are : - a newly recruited analyst at the DEA / police, what would you recommend? - asked by el jefe del cartel (boss of the cartel) on how to expand operation / operate better, what would you say?

Happy wrangling!
SF 311
console.cloud.google.com
Updated Jun 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:City%20and%20County%20of%20San%20Francisco&hl=nb (2022). SF 311 [Dataset]. https://console.cloud.google.com/marketplace/details/san-francisco-public-data/sf-311?filter=solution-type%3Adataset&hl=nb
Explore at:
Dataset updated
Jun 26, 2022
Dataset provided by
Googlehttp://google.com/
Area covered
San Francisco
Description
This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
GitHub Activity Data
console.cloud.google.com
Updated Mar 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:GitHub&hl=zh-CN (2023). GitHub Activity Data [Dataset]. https://console.cloud.google.com/marketplace/product/github/github-repos?hl=zh-CN
Explore at:
Dataset updated
Mar 3, 2023
Dataset provided by
GitHubhttps://github.com/
Googlehttp://google.com/
GitHubhttp://github.com/
Description
GitHub is how people build software and is home to the largest community of open source developers in the world, with over 12 million people contributing to 31 million projects on GitHub since 2008. This 3TB+ dataset comprises the largest released source of GitHub activity to date. It contains a full snapshot of the content of more than 2.8 million open source GitHub repositories including more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files, all of which are searchable with regular expressions. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Facebook

Twitter

Click to copy link

Link copied

Cite

https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ja (2022). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=ja

Google Trends

Explore at:

Dataset updated

May 10, 2022

Dataset provided by

Google Searchhttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/

Description

The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

Clear search

Close search

Google apps

Main menu

Google Trends

Looker Ecommerce BigQuery Dataset

Looker Ecommerce Dataset Description

1. distribution_centers.csv

2. events.csv

3. inventory_items.csv

4. order_items.csv

5. orders.csv

6. products.csv

7. users.csv

theLook eCommerce

Libraries.io Data

Context

Content

Acknowledgements

Inspiration

International Education

Data from: Stack Overflow

Day & night temperatures, 50yrs, 1666ws, TFRecord

census-bureau-international

Context

Querying BigQuery tables

Sample Query 1

standardSQL

Sample Query 2

standardSQL

Sample Query 3

Update frequency

Dataset source

Google Community Mobility Reports

The Met Public Domain Art Works

NPPES Plan and Provider Enumeration System

Context

Content

Acknowledgements

Inspiration

G_political_ads

Forest Inventory Analysis

COKI Language Dataset

census-bureau-usa

Context :

Dataset source

Sample Query

Terms of use

eumetsat-rss

Data from: Hacker News

Chicago Narcotics Crime Jan 2016 - Jul 2020

Introduction

Content

Acknowledgements

Inspiration

SF 311

GitHub Activity Data

Google Trends

1. `distribution_centers.csv`

2. `events.csv`

3. `inventory_items.csv`

4. `order_items.csv`

5. `orders.csv`

6. `products.csv`

7. `users.csv`