Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Adapted from Wikipedia: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Created in 2004, it was inspired by the success of Wikipedia and more than two million registered users who can add data by manual survey, GPS devices, aerial photography, and other free sources.
To aid researchers, data scientists, and analysts in the effort to combat COVID-19, Google is making a hosted repository of public datasets including OpenStreetMap data, free to access. To facilitate the Kaggle community to access the BigQuery dataset, it is onboarded to Kaggle platform which allows querying it without a linked GCP account. Please note that due to the large size of the dataset, Kaggle applies a quota of 5 TB of data scanned per user per 30-days.
This is the OpenStreetMap (OSM) planet-wide dataset loaded to BigQuery.
Tables:
- history_* tables: full history of OSM objects.
- planet_* tables: snapshot of current OSM objects as of Nov 2019.
The history_* and planet_* table groups are composed of node, way, relation, and changeset tables. These contain the primary OSM data types and an additional changeset corresponding to OSM edits for convenient access. These objects are encoded using the BigQuery GEOGRAPHY data type so that they can be operated upon with the built-in geography functions to perform geometry and feature selection, additional processing.
You can read more about OSM elements on the OSM Wiki. This dataset uses BigQuery GEOGRAPHY datatype which supports a set of functions that can be used to analyze geographical data, determine spatial relationships between geographical features, and construct or manipulate GEOGRAPHYs.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Adapted from Wikipedia: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Created in 2004, it was inspired by the success of Wikipedia and more than two million registered users who can add data by manual survey, GPS devices, aerial photography, and other free sources. We've made available a number of tables (explained in detail below): history_* tables: full history of OSM objects planet_* tables: snapshot of current OSM objects as of Nov 2019 The history_* and planet_* table groups are composed of node, way, relation, and changeset tables. These contain the primary OSM data types and an additional changeset corresponding to OSM edits for convenient access. These objects are encoded using the BigQuery GEOGRAPHY data type so that they can be operated upon with the built-in geography functions to perform geometry and feature selection, additional processing. Example analyses are given below. This dataset is part of a larger effort to make data available in BigQuery through the Google Cloud Public Datasets program . OSM itself is produced as a public good by volunteers, and there are no guarantees about data quality. Interested in learning more about how these data were brought into BigQuery and how you can use them? Check out the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
ChEMBL is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK.
ChEMBL is a manually curated database of bioactive molecules with drug-like properties used in drug discovery, including information about existing patented drugs.
Schema: http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23/chembl_23_schema.png
Documentation: http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23/schema_documentation.html
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
“ChEMBL” by the European Bioinformatics Institute (EMBL-EBI), used under CC BY-SA 3.0. Modifications have been made to add normalized publication numbers.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:ebi_chembl
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
-- Queries in SQL for the ETL process.
-- creating the first target table to capture the entire year.
SELECT
TRI.usertype,
ZIPSTART.zip_code AS zip_code_start,
ZIPSTARTNAME.borough borough_start,
ZIPSTARTNAME.neighborhood AS neighborhood_start,
ZIPEND.zip_code AS zip_code_end,
ZIPENDNAME.borough borough_end,
ZIPENDNAME.neighborhood AS neighborhood_end,
-- Since this is a fictional dashboard, we will add 6 years to make it look recent
DATE_ADD(DATE(TRI.starttime), INTERVAL 6 YEAR) AS start_day,
DATE_ADD(DATE(TRI.stoptime), INTERVAL 6 YEAR) AS stop_day,
WEA.temp AS day_mean_temperature, -- Mean temperature
WEA.wdsp AS day_mean_wind_speed, -- Mean wind speed
WEA.prcp day_total_precipitation, -- Total precipitation
-- Group trips into 10 minute intervals to reduces the number of rows
ROUND(CAST(TRI.tripduration / 60 AS INT64), -1) AS trip_minutes,
COUNT(TRI.bikeid) AS trip_count
FROM
bigquery-public-data.new_york_citibike.citibike_trips AS TRI
INNER JOIN
bigquery-public-data.geo_us_boundaries.zip_codes ZIPSTART
ON ST_WITHIN(
ST_GEOGPOINT(TRI.start_station_longitude, TRI.start_station_latitude),
ZIPSTART.zip_code_geom)
INNER JOIN
bigquery-public-data.geo_us_boundaries.zip_codes ZIPEND
ON ST_WITHIN(
ST_GEOGPOINT(TRI.end_station_longitude, TRI.end_station_latitude),
ZIPEND.zip_code_geom)
INNER JOIN
bigquery-public-data.noaa_gsod.gsod20* AS WEA
ON PARSE_DATE("%Y%m%d", CONCAT(WEA.year, WEA.mo, WEA.da)) = DATE(TRI.starttime)
INNER JOIN
my-project-for-da-cert-1.cyclistic.nyc_zips AS ZIPSTARTNAME
ON ZIPSTART.zip_code = CAST(ZIPSTARTNAME.zip AS STRING)
INNER JOIN
my-project-for-da-cert-1.cyclistic.nyc_zips AS ZIPENDNAME
ON ZIPEND.zip_code = CAST(ZIPENDNAME.zip AS STRING)
WHERE
-- This takes the weather data from new york central park, weather station id 94728
WEA.wban = '94728'
-- Use data from 2014 and 2015
AND EXTRACT(YEAR FROM DATE(TRI.starttime)) BETWEEN 2014 AND 2015
GROUP BY
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13;
-- creating the second target table to capture summer seasons. -- we will define summer as June to August.
SELECT
TRI.usertype,
TRI.start_station_longitude,
TRI.start_station_latitude,
TRI.end_station_longitude,
TRI.end_station_latitude,
ZIPSTART.zip_code AS zip_code_start,
ZIPSTARTNAME.borough borough_start,
ZIPSTARTNAME.neighborhood AS neighborhood_start,
ZIPEND.zip_code AS zip_code_end,
ZIPENDNAME.borough borough_end,
ZIPENDNAME.neighborhood AS neighborhood_end,
-- Since we're using trips from 2014 and 2015, we will add 6 years to make it look recent
DATE_ADD(DATE(TRI.starttime), INTERVAL 6 YEAR) AS start_day,
DATE_ADD(DATE(TRI.stoptime), INTERVAL 6 YEAR) AS stop_day,
WEA.temp AS day_mean_temperature, -- Mean temperature
WEA.wdsp AS day_mean_wind_speed, -- Mean wind speed
WEA.prcp day_total_precipitation, -- Total precipitation
-- We will group trips into 10 minute intervals, which also reduces the number of rows
ROUND(CAST(TRI.tripduration / 60 AS INT64), -1) AS trip_minutes,
TRI.bikeid
FROM
bigquery-public-data.new_york_citibike.citibike_trips AS TRI
INNER JOIN
bigquery-public-data.geo_us_boundaries.zip_codes ZIPSTART
ON ST_WITHIN(
ST_GEOGPOINT(TRI.start_station_longitude, TRI.start_station_latitude),
ZIPSTART.zip_code_geom)
INNER JOIN
bigquery-public-data.geo_us_boundaries.zip_codes ZIPEND
ON ST_WITHIN(
ST_GEOGPOINT(TRI.end_station_longitude, TRI.end_station_latitude),
ZIPEND.zip_code_geom)
INNER JOIN
bigquery-public-data.noaa_gsod.gsod20* AS WEA
ON PARSE_DATE("%Y%m%d", CONCAT(WEA.year, WEA.mo, WEA.da)) = DATE(TRI.starttime)
INNER JOIN
my-project-for-da-cert-1.cyclistic.nyc_zips AS ZIPSTARTNAME
ON ZIPSTART.zip_code = CAST(ZIPSTARTNAME.zip AS STRING)
INNER JOIN
my-project-for-da-cert-1.cyclistic.nyc_zips AS ZIPENDNAME
ON ZIPEND.zip_code = CAST(ZIPENDNAME.zip AS STRING)
WHERE
-- Take the weather from the same new york central park weather station, id 94728
WEA.wban = '94728'
-- Use data for the three summer months
AND DATE(TRI.starttime) BETWEEN DATE('2015-06-01') AND DATE('2015-08-31');
Facebook
TwitterFrom the public dataset in BigQuery. Duplicate entries were removed, along with entries containing null values. bigquery-public-data.sunroof_solar.solar_potential_by_censustract
The data was cleaned using the following script:
```SQL
WITH solar AS (
SELECT *
FROM (
SELECT rn, region_name, count_qualified
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY region_name ORDER BY count_qualified DESC) rn,
region_name, count_qualified, kw_total
FROM bigquery-public-data.sunroof_solar.solar_potential_by_censustract AS solar
ORDER BY region_name)
WHERE kw_total IS NOT NULL )
WHERE rn = 1)
SELECT b.* FROM solar s JOIN bigquery-public-data.sunroof_solar.solar_potential_by_censustract b ON s.region_name = b.region_name AND s.count_qualified = b.count_qualified ORDER BY s.region_name ```
To filter out regions in Alaska, Hawaii, and Puerto Rico, add
SQL
WHERE SUBSTRING(region_name,1,2) NOT IN('02','15','72')
(dataset solar_contiguous)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is just a copy of the Methane Emissions Around The World (1990-2018) dataset provided by @koustubhk. The only difference is that I modified the "2018".."1990" columns' names by adding manually the word "year" to each year column name.
This endeavor was for the sole purpose of making the dataset “loadable” to BigQuery. BigQuery schema detection for header setting works by detecting strings in the first row of a .csv file. Since the original .csv file has numerical values in the first row, the schema detection process will result in a detection error, which consequently will load the dataset with generic columns names such as "string_field_0" or fail if the schema was edited manually.
The only solution to successfully load this dataset to BigQuery is to transform the header values into string values. Now you can upload this dataset to BigQuery with no error messages and generic columns' names.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Adapted from Wikipedia: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Created in 2004, it was inspired by the success of Wikipedia and more than two million registered users who can add data by manual survey, GPS devices, aerial photography, and other free sources.
To aid researchers, data scientists, and analysts in the effort to combat COVID-19, Google is making a hosted repository of public datasets including OpenStreetMap data, free to access. To facilitate the Kaggle community to access the BigQuery dataset, it is onboarded to Kaggle platform which allows querying it without a linked GCP account. Please note that due to the large size of the dataset, Kaggle applies a quota of 5 TB of data scanned per user per 30-days.
This is the OpenStreetMap (OSM) planet-wide dataset loaded to BigQuery.
Tables:
- history_* tables: full history of OSM objects.
- planet_* tables: snapshot of current OSM objects as of Nov 2019.
The history_* and planet_* table groups are composed of node, way, relation, and changeset tables. These contain the primary OSM data types and an additional changeset corresponding to OSM edits for convenient access. These objects are encoded using the BigQuery GEOGRAPHY data type so that they can be operated upon with the built-in geography functions to perform geometry and feature selection, additional processing.
You can read more about OSM elements on the OSM Wiki. This dataset uses BigQuery GEOGRAPHY datatype which supports a set of functions that can be used to analyze geographical data, determine spatial relationships between geographical features, and construct or manipulate GEOGRAPHYs.