Facebook
TwitterThis dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Querying BigQuery tables You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME].
If you're using Python, you can start with this code:
import pandas as pd
from bq_helper import BigQueryHelper
bq_assistant = BigQueryHelper("bigquery-public-data", "utility_us")
Facebook
Twitterhttps://choosealicense.com/licenses/osl-3.0/https://choosealicense.com/licenses/osl-3.0/
Process to Generate DuckDB Dataset
1. Load Repository Metadata
Read repo_metadata.json from GitHub Public Repository Metadata Normalize JSON into three lists: Repositories → general metadata (stars, forks, license, etc.). Languages → repo-language mappings with size. Topics → repo-topic mappings.
Convert lists into Pandas DataFrames: df_repos, df_languages, df_topics.
2. Enhance with BigQuery Data
Create a temporary BigQuery table (repo_list)… See the full description on the dataset page: https://huggingface.co/datasets/deepgit/github_meta.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.
Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer.
Fork this kernel to get started with this dataset.
Dataset Source: https://archive.org/download/stackexchange
https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow
https://cloud.google.com/bigquery/public-data/stackoverflow
Banner Photo by Caspar Rubin from Unplash.
What is the percentage of questions that have been answered over the years?
What is the reputation and badge count of users across different tenures on StackOverflow?
What are 10 of the “easier” gold badges to earn?
Which day of the week has most questions answered within an hour?
Facebook
TwitterThe Genome Aggregation Database (gnomAD) is maintained by an international coalition of investigators to aggregate and harmonize data from large-scale sequencing projects. These public datasets are available in VCF format in Google Cloud Storage and in Google BigQuery as integer range partitioned tables . Each dataset is sharded by chromosome meaning variants are distributed across 24 tables (indicated with “_chr*” suffix). Utilizing the sharded tables reduces query costs significantly. Variant Transforms was used to process these VCF files and import them to BigQuery. VEP annotations were parsed into separate columns for easier analysis using Variant Transforms’ annotation support . These public datasets are included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Find out more in our blog post, Providing open access to gnomAD on Google Cloud . Questions? Contact gcp-life-sciences-discuss@googlegroups.com.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Section 337, Tariff Act of 1930, Investigations of Unfair Practices in Import Trade. Under section 337, the USITC determines whether there is unfair competition in the importation of products into, or their subsequent sale in, the United States. Section 337 prohibits the importation into the US , or the sale of such articles by owners, importers or consignees, of articles which infringe a patent, copyright, trademark, or semiconductor mask work, or where unfair competition or unfair acts exist that can destroy or substantially injure a US industry or prevent one from developing, or restrain or monopolize trade in US commerce. These latter categories are very broad: unfair competition can involve counterfeit, mismarked or misbranded goods, where the sale of the goods are at unfairly low prices, where other antitrust violations take place such as price fixing, market division or the goods violate a standard applicable to such goods.
US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:usitc_investigations
"US International Trade Commission 337Info Unfair Import Investigations Information System" by the USITC, for public use.
Banner photo by João Silas on Unsplash
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterAs the price of installing solar has gotten less expensive, more homeowners are turning to it as a possible option for decreasing their energy bill. We want to make installing solar panels easy and understandable for anyone. Project Sunroof puts Google's expansive data in mapping and computing resources to use, helping calculate the best solar plan for you. How does it work? When you enter your address, Project Sunroof looks up your home in Google Maps and combines that information with other databases to create your personalized roof analysis. Don’t worry, Project Sunroof doesn't give the address to anybody else. Learn more about Project Sunroof and see the tool at Project Sunroof’s site . Project Sunroof computes how much sunlight hits roofs in a year, based on shading calculations, typical meteorological data, and estimates of the size and shape of the roofs. You can see more details about how solar viability is determined by checking out methodology here. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
TwitterThe quality of our water is vital, and understanding the factors that impact it is crucial for both the environment and public health. A new collaboration between Google Cloud and Stream, a consortium of UK water companies with a collective vision to unlock water data, is putting the power of data and AI into the hands of communities, driving transparency and informed decision-making around water quality. This initiative leverages Stream's ever-growing catalogue of water sector data and combines it with Google Cloud's BigQuery and advanced Generative AI. The result? A revolutionary way to access, analyse, and understand complex water quality information.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.
This is the list of manipulations performed on the original dataset, published by Möbius.
All the cleaning process and rearrangements were performed in BigQuery, using SQL functions.
1) After I took a closer look at the source dataset, I realized that for my case study, I did not need some of the tables contained in the original archive. Therefore, I decided not to import
- dailyCalories_merged.csv,
- dailyIntensities_merged.csv,
- dailySteps_merged.csv.
as they proved redundant, their content could be found in the dailyActivity_merged.csv file.
In addition, the files
- minutesCaloriesWide_merged.csv,
- minutesIntensitiesWide_merged.csv,
- minuteStepsWide_merged.csv.
were not imported, as they presented the same data contained in other files in a wide format. Hence, only the files with long format containing the same data were imported in the BigQuery database.
2) To be able to compare and measure the correlation among different variables based on hourly records, I decided to create a new table based on LEFT JOIN function and columns Id and ActivityHour. I repeated the same JOIN on tables with minute records. Hence I obtained 2 new tables: - hourly_activity.csv, - minute_activity.csv.
3) To validate most of the columns containing DATE and DATETIME values that were imported as STRING data type, I used the PARSE_DATE() and PARSE_DATETIME() commands. While importing the - heartrate_seconds_merged.csv, - hourlyCalories_merged.csv, - hourlyIntensities_merged.csv, - hourlySteps_merged.csv, - minutesCaloriesNarrow_merged.csv, - minuteIntensitiesNarrow_merged.csv, - minuteMETsNarrow_merged.csv, - minuteSleep_merged.csv, - minuteSteps_merged.csv, - sleepDay_merge.csv, - weigthLog_Info_merged.csv files to BigQuery, it was necessary to import the DATETIME and DATE type columns as STRING, because the original syntax, used in the CSV files, couldn’t be recognized as a correct DATETIME data type, due to “AM” and “PM” text at the end of the expression.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains hourly cryptocurrency and stock market data collected from coingecko starting in March 2025. The collection pipeline was designed to demonstrate practical data management and automation skills: - Data Ingestion: A Python Script automatically scrapes fresh hourly data from coingecko and writes it into Google Sheets. - Data Offloading: To avoid Google Sheets’ row limitations, Python scripts periodically export data from Sheets into Google BigQuery. - Data Publishing: The data is shared to Kaggle via a scheduled notebook, ensuring the dataset is updated daily with the latest available records.
This setup provides a reliable, reproducible data stream that can be used for: - Practicing SQL queries for data extraction, filtering, and preparation before analysis - Exploratory data analysis of crypto and stock price movements - Building time-series forecasting models - Studying correlations between global assets - Demonstrating real-world ETL (Extract, Transform, Load) and data pipeline engineering
The dataset is continuously updated hourly, making it suitable both for live monitoring and historical trend analysis.
Facebook
Twitterhttps://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19062145%2F025ccf521f62db512b4a98edd0b3508a%2FKimia_Farma_Dashboard.jpg?generation=1748428094441761&alt=media" alt="">This project analyzes Kimia Farma's performance from 2020 to 2023 using Google Looker Studio. The analysis is based on a pre-processed dataset stored in BigQuery, which serves as the data source for the dashboard.
The dashboard is designed to provide insights into branch performance, sales trends, customer ratings, and profitability. The development is ongoing, with multiple pages planned for a more in-depth analysis.
✅ The first page of the dashboard is completed
✅ A sample dashboard file is available on Kaggle
🔄 Development will continue with additional pages
The dataset consists of transaction records from Kimia Farma branches across different cities and provinces. Below are the key columns used in the analysis:
- transaction_id: Transaction ID code
- date: Transaction date
- branch_id: Kimia Farma branch ID code
- branch_name: Kimia Farma branch name
- kota: City of the Kimia Farma branch
- provinsi: Province of the Kimia Farma branch
- rating_cabang: Customer rating of the Kimia Farma branch
- customer_name: Name of the customer who made the transaction
- product_id: Product ID code
- product_name: Name of the medicine
- actual_price: Price of the medicine
- discount_percentage: Discount percentage applied to the medicine
- persentase_gross_laba: Gross profit percentage based on the following conditions:
Price ≤ Rp 50,000 → 10% profit
Price > Rp 50,000 - 100,000 → 15% profit
Price > Rp 100,000 - 300,000 → 20% profit
Price > Rp 300,000 - 500,000 → 25% profit
Price > Rp 500,000 → 30% profit
- nett_sales: Price after discount
- nett_profit: Profit earned by Kimia Farma
- rating_transaksi: Customer rating of the transaction
📌 kimia farma_query.txt – Contains SQL queries used for data analysis in Looker Studio
📌 kimia farma_analysis_table.csv – Preprocessed dataset ready for import and analysis
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a cleaned version of the Chicago Crime Dataset, which can be found here. All rights for the dataset go to the original owners. The purpose of this dataset is to display my skills in visualizations and creating dashboards. To be specific, I will attempt to create a dashboard that will allow users to see metrics for a specific crime within a given year using filters and metrics. Due to this, there will not be much of a focus on the analysis of the data, but there will be portions discussing the validity of the dataset, the steps I took to clean the data, and how I organized it. The cleaned datasets can be found below, the Query (which utilized BigQuery) can be found here and the Tableau dashboard can be found here.
The dataset comes directly from the City of Chicago's website under the page "City Data Catalog." The data is gathered directly from the Chicago Police's CLEAR (Citizen Law Enforcement Analysis and Reporting) and is updated daily to present the information accurately. This means that a crime on a specific date may be changed to better display the case. The dataset represents crimes starting all the way from 2001 to seven days prior to today's date.
Using the ROCCC method, we can see that: * The data has high reliability: The data covers the entirety of Chicago from a little over 2 decades. It covers all the wards within Chicago and even gives the street names. While we may not have an idea for how big the sample size is, I do believe that the dataset has high reliability since it geographically covers the entirety of Chicago. * The data has high originality: The dataset was gained directly from the Chicago Police Dept. using their database, so we can say this dataset is original. * The data is somewhat comprehensive: While we do have important information such as the types of crimes committed and their geographic location, I do not think this gives us proper insights as to why these crimes take place. We can pinpoint the location of the crime, but we are limited by the information we have. How hot was the day of the crime? Did the crime take place in a neighborhood with low-income? I believe that these key factors prevent us from getting proper insights as to why these crimes take place, so I would say that this dataset is subpar with how comprehensive it is. * The data is current: The dataset is updated frequently to display crimes that took place seven days prior to today's date and may even update past crimes as more information comes to light. Due to the frequent updates, I do believe the data is current. * The data is cited: As mentioned prior, the data is collected directly from the polices CLEAR system, so we can say that the data is cited.
The purpose of this step is to clean the dataset such that there are no outliers in the dashboard. To do this, we are going to do the following: * Check for any null values and determine whether we should remove them. * Update any values where there may be typos. * Check for outliers and determine if we should remove them.
The following steps will be explained in the code segments below. (I used BigQuery for this so the coding will follow BigQuery's syntax) ```
SELECT
*
FROM
portfolioproject-350601.ChicagoCrime.Crime
LIMIT 1000;
SELECT
*
FROM
portfolioproject-350601.ChicagoCrime.Crime
WHERE
unique_key IS NULL OR
case_number IS NULL OR
date IS NULL OR
primary_type IS NULL OR
location_description IS NULL OR
arrest IS NULL OR
longitude IS NULL OR
latitude IS NULL;
DELETE FROM
portfolioproject-350601.ChicagoCrime.Crime
WHERE
unique_key IS NULL OR
case_number IS NULL OR
date IS NULL OR
primary_type IS NULL OR
location_description IS NULL OR
arrest IS NULL OR
longitude IS NULL OR
latitude IS NULL;
SELECT unique_key, COUNT(unique_key) FROM `portfolioproject-350601.ChicagoCrime....
Facebook
TwitterAttribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Taxa de precipitação estimada de áreas do sudeste brasileiro. As estimativas são feitas de hora em hora, cada registro contendo dados desta estimativa. Cada área é um quadrado formado por 4km de lado. Dados coletados pelo satélite GOES-16.
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.meio_ambiente_clima.taxa_precipitacao_satelite`
LIMIT
1000
Clique aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
)
Cobertura temporal
Desde 2020 até a data corrente
Frequência de atualização
Diário
Órgão gestor
Centro de Operações da Prefeitura do Rio (COR)
Colunas
Nome
Descrição
latitude
Latitude do centro da área.
longitude
Longitude do centro da área.
rrqpe
Taxa de precipitação estimada, medidas em milímetros por hora.
primary_key
Chave primária criada a partir da concatenação da coluna data, horário, latitude e longitude. Serve para evitar dados duplicados.
horario
Horário no qual foi realizada a medição
data_particao
Data na qual foi realizada a medição
Dados do publicador
Nome: Patrícia Catandi
E-mail: patriciabcatandi@gmail.com
Facebook
TwitterAttribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Dados sobre as estações pluviométricas do alertario ( Sistema Alerta Rio da Prefeitura do Rio de Janeiro ) na cidade do Rio de Janeiro.
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou,
para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.meio_ambiente_clima.estacoes_alertario`
LIMIT
1000
Clique aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.meio_ambiente_clima.estacoes_alertario` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.meio_ambiente_clima.estacoes_alertario` LIMIT 1000"
)
Cobertura temporal
N/A
Frequência de atualização
Anual
Órgão gestor
COR
Colunas
Nome
Descrição
x
X UTM (SAD69 Zona 23)
longitude
Longitude onde a estação se encontra.
id_estacao
ID da estação definido pelo AlertaRIO.
estacao
Nome da estação.
latitude
Latitude onde a estação se encontra.
cota
Altura em metros onde a estação se encontra.
endereco
Endereço completo da estação.
situacao
Indica se a estação está operante ou com falha.
data_inicio_operacao
Data em que a estação começou a operar.
data_fim_operacao
Data em que a estação parou de operar.
data_atualizacao
Última data em que os dados sobre a data de operação foram atualizados.
y
Y UTM (SAD69 Zona 23)
Dados do publicador
Nome: Patricia Catandi
E-mail: patriciabcatandi@gmail.com
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries.
Over 9000 stations' data are typically available.
The daily elements included in the dataset (as available from each station) are: Mean temperature (.1 Fahrenheit) Mean dew point (.1 Fahrenheit) Mean sea level pressure (.1 mb) Mean station pressure (.1 mb) Mean visibility (.1 miles) Mean wind speed (.1 knots) Maximum sustained wind speed (.1 knots) Maximum wind gust (.1 knots) Maximum temperature (.1 Fahrenheit) Minimum temperature (.1 Fahrenheit) Precipitation amount (.01 inches) Snow depth (.1 inches)
Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.
This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and present, collected from over 9000 stations. Dataset Source: NOAA
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Photo by Allan Nygren on Unsplash
Facebook
TwitterAttribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Procedimentos operacionais padrões (POP) existentes na PCRJ. Um POP é um procedimento que será usado para solucionar um evento. Um POP é composto de várias atividades. Um evento é uma ocorrência na cidade do Rio de Janeiro que exija um acompanhamento e na maioria das vezes uma ação da PCRJ, como por exemplo um buraco na rua. Acesse também através da API do Escritório de Dados: https://api.dados.rio/v1/
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.adm_cor_comando.procedimento_operacional_padrao`
LIMIT
1000
Clique aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.adm_cor_comando.procedimento_operacional_padrao` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.adm_cor_comando.procedimento_operacional_padrao` LIMIT 1000"
)
Cobertura temporal
Não informado.
Frequência de atualização
Mensal
Órgão gestor
COR
Colunas
Nome
Descrição
id_pop
Identificador do POP procedimento operacional padrão).
pop_titulo
Nome do procedimento operacional padrão.
Dados do(a) publicador(a)
Nome: Patrícia Catandi
E-mail: patriciabcatandi@gmail.com
Facebook
TwitterAttribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Chamados feitos ao 1746. São chamados desde março de 2011, quando começou o projeto 1746.
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.administracao_servicos_publicos.chamado_1746`
LIMIT
1000
Clique aqui para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.administracao_servicos_publicos.chamado_1746` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.administracao_servicos_publicos.chamado_1746` LIMIT 1000"
)
Cobertura temporal
Março de 2011
Frequência de atualização
Diário
Órgão gestor
SEGOVI
Colunas
Nome
Descrição
id_chamado
Identificador único do chamado no banco de dados.
data_inicio
Data de abertura do chamado. Ocorre quando o operador registra o chamado.
data_fim
Data de fechamento do chamado. O chamado é fechado quando o pedido é atendido ou quando se percebe que o pedido não pode ser atendido.
id_bairro
Identificador único, no banco de dados, do bairro onde ocorreu o fato que gerou o chamado.
id_territorialidade
Identificador único, no banco de dados, da territorialidade onde ocorreu o fato que gerou o chamado. Territorialidade é uma região da cidade do Rio de Janeiro que tem com responsável um órgão especifico. Exemplo: CDURP, que é responsável pela região do porto do Rio de Janeiro.
id_logradouro
Identificador único, no banco de dados, do logradouro onde ocorreu o fato que gerou o chamado.
numero_logradouro
Número da porta onde ocorreu o fato que gerou o chamado.
id_unidade_organizacional
Identificador único, no banco de dados, do órgão que executa o chamado. Por exemplo: identificador da COMLURB quando o chamado é relativo a limpeza urbana.
nome_unidade_organizacional
Nome do órgão que executa a demanda. Por exemplo: COMLURB quando a demanda é relativa a limpeza urbana.
unidade_organizadional_ouvidoria
Booleano indicando se o chamado do cidadão foi feita Ouvidoria ou não. 1 caso sim, 0 caso não,
categoria
Categoria do chamado. Exemplo: Serviço, informação, sugestão, elogio, reclamação, crítica.
id_tipo
Identificador único, no banco de dados, do tipo do chamado. Ex: Iluminação pública.
tipo
Nome do tipo do chamado. Ex: Iluminação pública.
id_subtipo
Identificador único, no banco de dados, do subtipo do chamado. Ex: Reparo de lâmpada apagada.
subtipo
Nome do subtipo do chamado. Ex: Reparo de lâmpada apagada.
status
Status do chamado. Ex. Fechado com solução, aberto em andamento, pendente etc.
longitude
Longitude do lugar do evento que motivou o chamado.
latitude
Latitude do lugar do evento que motivou o chamado.
data_alvo_finalizacao
Data prevista para o atendimento do chamado. Caso prazo_tipo seja D fica em branco até o diagnóstico ser feito.
data_alvo_diagnostico
Data prevista para fazer o diagnóstico do serviço. Caso prazo_tipo seja F esta data fica em branco.
data_real_diagnostico
Data em que foi feito o diagnóstico do serviço. Caso prazo_tipo seja F esta data fica em branco.
tempo_prazo
Prazo para o serviço ser feito. Em dias ou horas após a abertura do chamado. Caso haja diagnóstico o prazo conta após se fazer o diagnóstico.
prazo_unidade
Unidade de tempo utilizada no prazo. Dias ou horas. D ou H.
prazo_tipo
Diagnóstico ou finalização. D ou F. Indica se a chamada precisa de diagnóstico ou não. Alguns serviços precisam de avaliação para serem feitos, neste caso é feito o diagnóstico. Por exemplo, pode de árvore. Há a necessidade de um engenheiro ambiental verificar a necessidade da poda ou não.
id_unidade_organizacional_mae
ID da unidade organizacional mãe do orgão que executa a demanda. Por exemplo: "CVA - Coordenação de Vigilância de Alimentos" é quem executa a demanda e obede a unidade organizacional mãe "IVISA-RIO - Instituto Municipal de Vigilância Sanitária, de Zoonoses e de Inspeção Agropecuária". A coluna se refere ao ID deste último.
situacao
Identifica se o chamado foi encerrado
tipo_situacao
Indica o status atual do chamado entre as categorias Atendido, Atendido parcialmente, Não atendido, Não constatado e Andamento
dentro_prazo
Indica se a data alvo de finalização do chamado ainda está dentro do prazo estipulado.
justificativa_status
Justificativa que os órgãos usam ao definir o status. Exemplo: SEM POSSIBILIDADE DE ATENDIMENTO - justificativa: Fora de área de atuação do municipio
reclamacoes
Quantidade de reclamações.
Dados do(a) publicador(a)
Nome: Patricia Catandi
E-mail: patriciabcatandi@gmail.com
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
OpenAQ is an open-source project to surface live, real-time air quality data from around the world. Their “mission is to enable previously impossible science, impact policy and empower the public to fight air pollution.” The data includes air quality measurements from 5490 locations in 47 countries.
Scientists, researchers, developers, and citizens can use this data to understand the quality of air near them currently. The dataset only includes the most current measurement available for the location (no historical data).
Update Frequency: Weekly
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.openaq.[TABLENAME]. Fork this kernel to get started.
Dataset Source: openaq.org
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source and is provided "AS IS" without any warranty, express or implied.
Facebook
TwitterAttribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Ocorrências disparadas pelo COR desde 2015. Uma ocorrência na cidade do Rio de Janeiro é um acontecimento que exije um acompanhamento e, na maioria das vezes, uma ação da PCRJ. Por exemplo, Buraco na pista, bolsão d'água, enguiço mecânico. Uma ocorrência aberta é uma ocorrência que ainda não foi solucionada. Acesse também através da API do Escritório de Dados: https://api.dados.rio/v1/
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.adm_cor_comando.ocorrencias`
LIMIT
1000
Clique aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.adm_cor_comando.ocorrencias` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.adm_cor_comando.ocorrencias` LIMIT 1000"
)
Cobertura temporal
Não informado.
Frequência de atualização
Diário
Órgão gestor
COR
Colunas
Nome
Descrição
data_inicio
Data e hora do registro do evento na PCRJ.
data_fim
Data e hora do encerramento do evento na PCRJ. O evento é encerrado quando é solucionado. Este atributo está vazio quanto o evento está aberto.
bairro
Bairro onde ocorreu o evento.
id_pop
Identificador do POP.
status
Status do evento (ABERTO, FECHADO).
gravidade
Gravidade do evento (BAIXO, MEDIO, ALTO, CRITICO).
prazo
Prazo esperado de solução do evento (CURTO, MEDIO(acima de 3 dias), LONGO( acima de 5 dias)).
latitude
Latitude em formato WGS-84 em que ocorreu o evento
longitude
Longitude em formato WGS-84 em que ocorreu o evento
id_evento
Identificador do evento.
descricao
Descrição do evento.
tipo
Tipo do evento (PRIMARIO, SECUNDARIO)
Dados do(a) publicador(a)
Nome: Patrícia Catandi
E-mail: patriciabcatandi@gmail.com
Facebook
TwitterThis dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .