Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.
The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents
For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/
“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.
Banner photo by Helloquence on Unsplash
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Section 337, Tariff Act of 1930, Investigations of Unfair Practices in Import Trade. Under section 337, the USITC determines whether there is unfair competition in the importation of products into, or their subsequent sale in, the United States. Section 337 prohibits the importation into the US , or the sale of such articles by owners, importers or consignees, of articles which infringe a patent, copyright, trademark, or semiconductor mask work, or where unfair competition or unfair acts exist that can destroy or substantially injure a US industry or prevent one from developing, or restrain or monopolize trade in US commerce. These latter categories are very broad: unfair competition can involve counterfeit, mismarked or misbranded goods, where the sale of the goods are at unfairly low prices, where other antitrust violations take place such as price fixing, market division or the goods violate a standard applicable to such goods.
US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:usitc_investigations
"US International Trade Commission 337Info Unfair Import Investigations Information System" by the USITC, for public use.
Banner photo by João Silas on Unsplash
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.
This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']
This dataset is deprecated and not being updated.
Fork this kernel to get started with this dataset.
Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by @meric from Unplash.
Which neighborhoods have the highest proportion of offensive graffiti?
Which complaint is most likely to be made using Twitter and in which neighborhood?
What are the most complained about Muni stops in San Francisco?
What are the top 10 incident types that the San Francisco Fire Department responds to?
How many medical incidents and structure fires are there in each neighborhood?
What’s the average response time for each type of dispatched vehicle?
Which category of police incidents have historically been the most common in San Francisco?
What were the most common police incidents in the category of LARCENY/THEFT in 2016?
Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?
What is the average tree diameter?
What is the highest number of a particular species of tree planted in a single year?
Which San Francisco locations feature the largest number of trees?
The Genome Aggregation Database (gnomAD) is maintained by an international coalition of investigators to aggregate and harmonize data from large-scale sequencing projects. These public datasets are available in VCF format in Google Cloud Storage and in Google BigQuery as integer range partitioned tables . Each dataset is sharded by chromosome meaning variants are distributed across 24 tables (indicated with “_chr*” suffix). Utilizing the sharded tables reduces query costs significantly. Variant Transforms was used to process these VCF files and import them to BigQuery. VEP annotations were parsed into separate columns for easier analysis using Variant Transforms’ annotation support . These public datasets are included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Find out more in our blog post, Providing open access to gnomAD on Google Cloud . Questions? Contact gcp-life-sciences-discuss@googlegroups.com.
This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. This dataset contains the following tables: games_wide (every pitch, steal, or lineup event for each at bat in the 2016 regular season), games_post_wide(every pitch, steal, or lineup event for each at-bat in the 2016 post season), and schedules ( the schedule for every team in the regular season). The schemas for the games_wide and games_post_wide tables are identical. With this data you can effectively replay a game and rebuild basic statistics for players and teams. Note: This data was built via a denormalization process over raw game log files which may contain scoring errors and in some cases missing data. For official scoring and statistical information please consult mlb.com , baseball-reference.com , or sportradar.com . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Labeled datasets are useful in machine learning research.
This public dataset contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.
Tables: 1) annotations_bbox 2) dict 3) images 4) labels
Update Frequency: Quarterly
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:open_images
https://cloud.google.com/bigquery/public-data/openimages
APA-style citation: Google Research (2016). The Open Images dataset [Image urls and labels]. Available from github: https://github.com/openimages/dataset.
Use: The annotations are licensed by Google Inc. under CC BY 4.0 license.
The images referenced in the dataset are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.
Banner Photo by Mattias Diesel from Unsplash.
Which labels are in the dataset? Which labels have "bus" in their display names? How many images of a trolleybus are in the dataset? What are some landing pages of images with a trolleybus? Which images with cherries are in the training set?
Global B2B Company Database | 65M+ Verified Firms | Firmographics Forget stale corporate directories – Forager.ai delivers living, breathing company intelligence trusted by VCs, Fortune 500 teams, and SaaS leaders. Our 65 million+ AI-validated company profiles are refreshed every 14 days to track leadership changes, tech migrations, and growth signals competitors miss.
Why This Outperforms Generic Firmographics ✅ AI That Works Like Your Best Analyst Cross-references 12+ sources to: ✔ Flag companies hiring sales teams → Ready to buy ✔ Detect tech stack changes → Migration opportunities ✔ Identify layoffs/expansions → Timely outreach windows
✅ Freshness That Matters We update 100% of records every 2-3 weeks – critical for tracking:
Funding round and revenue.
Company job posts
✅ Ethical & Audit-Ready Full GDPR/CCPA compliance with:
Usage analytics dashboard
Your Secret Weapon for: 🔸 Sales Teams: → Identify high-growth targets 83% faster (employee growth + tech stack filters) → Prioritize accounts with "hiring spree" or "new funding" tags
🔸 Investors: → Track 18K+ private companies with revenue/employee alerts → Portfolio monitoring with 92% prediction accuracy on revenue shifts
🔸 Marketers: → ABM campaigns powered by technographics (Slack → Teams migrators) → Event targeting using travel patterns (HQ → conference city matches)
🔸 Data Teams: → Enrich Snowflake/Redshift warehouses via API → Build custom models with 150+ firmographic/technographic fields
Core Data Points ✔ Financial Health: Revenue ranges, funding history, growth rate estimates ✔ Tech Stack: CRM, cloud platforms, marketing tools, Web technologies used. ✔ People Moves: C-suite, Employees headcount ✔ Expansion Signals: New offices, job postings.
Enterprise-Grade Delivery
API: Credits system to find company using any field in schema; returns name, domain, industry, headcount, location, LinkedIn etc.
Cloud Sync: Auto-update Snowflake/Redshift/BigQuery
CRM Push: Direct to Salesforce/HubSpot/Pipedrive
Flat Files: CSV/JSON
Why Clients Never Go Back to Legacy Providers → 6-Month ROI Guarantee – We’ll beat your current vendor or extend your plan → Free Data Audit – Upload your CRM list → We’ll show gaps/opportunities → Live Training – Our analysts teach you to mine hidden insights
Keywords (Naturally Integrated): Global Company Data | Firmographic Database | B2B Technographic data | Private Company Intelligence | CRM Enrichment API | Sales Lead Database | VC Due Diligence Data | AI-Validated Firmographics | Market Expansion Signals | Competitor Benchmarking
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
# An Empirical Study of Proxy Smart Contracts at Ethereum Ecosystem Scale
In this work, we conduct the first comprehensive study on Ethereum proxies. We organize our data and code into three sections as follows, aligning with the structure of our paper.
* **1. Proxy Contract Preparation.** To collect a comprehensive dataset of proxies, we propose *ProxyEx*, the first framework designed to detect proxy directly from bytecode.
* **2. Logic Contract Preparation.** To analyze the logic contracts of proxies, we extract transactions and traces for extracting logic contracts from all the related proxies.
* **3. Three Research Questions.** In this paper, we conduct the first systematic study on proxies on Ethereum, aiming to answer the following research questions.
* RQ1: Statistics. How many proxies are there on Ethereum? How often do proxies modify their logic? How many transactions are executed on proxies?
* RQ2: Purpose. What are the major purposes of implementing proxy patterns for smart contracts?
* RQ3: Bugs and Pitfalls. What types of bugs and pitfalls can exist in proxies?
## 1. Proxy Contract Preparation
To facilitate proxy contract data collection, we design a system, *ProxyEx*, to detect proxy contracts from contract bytecode.
#### Environment Setup
First make sure you have the following things installed on your system:
* Boost libraries (Can be installed on Debian with apt install libboost-all-dev)
* Python 3.8
* Souffle 2.3 or 2.4
Now install the Souffle custom functors and then You should now be ready to run *ProxyEx*.
* run *cd proxy-contract-prep/proxyex/souffle-addon$ && make*
#### Step 1: unzip the contracts for getting bytecode
We collect all the on-chain smart contract bytecode as of September 10, 2023. In total, we have *62,578,635* smart contracts.
* download *contracts.zip* under *proxyex* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing) and place it under *proxy-contract-prep*
* run *unzip contracts.zip* under *proxy-contract-prep* ---> generate the contract bytecode under *proxy-contract-prep/contracts*
#### Step 2: run the proxy detection script in parallel
To speed up the detection process, we optimize it by running multiple python scripts.
* run *bash proxy.sh* under *proxy-contract-prep/scripts-run* ---> generate all the results under *proxy-contract-prep/version1*
* run *bash kill.sh* under *proxy-contract-prep/scripts-run* ---> kill all the running scripts
#### Step 3: analyze all the proxy detection results
We apply *ProxyEx* on the smart contract bytecode with a timeout of *60* seconds; there are *2,031,422* proxy addresses in total (3.25\%). The average detection time of proxy and non-proxy contracts are *14.85* seconds and *3.88* seconds, respectively.
* download *first_contract.json* under *proxyex* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing) and place it under *proxy-contract-prep/scripts-analyze*
* run *python3 analyze_proxy.py* under *proxy-contract-prep/scripts-analyze* ---> generate all the results under *proxy-contract-prep/scripts-analyze/stats1* for later analysis.
* we have already uploaded our analysis results into the *stats1.zip* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing); run *unzip stats1.zip* under under *proxy-contract-prep/scripts-analyze*, you will get some results such as *all_proxy.txt* lists all the 2,031,422 proxy addresses.
#### Step 4: manually analyze 1k contracts for accuracy
To evaluate its effectiveness and performance, we randomly sampled 1,000 contracts from our dataset. Our examination revealed *548* proxy addresses and *452* non-proxy addresses.
*ProxyEx* misclassified one proxy as non-proxy (false negative), indicating that our framework achieves *100\%* precision and over *99\%* recall.
* *proxy-contract-prep/1k.csv* displays our manually checked results of 1,000 randomly sampled contracts
## 2. Logic Contract Preparation
To extract logic contract addresses, we gather all the transaction traces associated with a *DELEGATECALL* sent from the proxy contracts. We collect a 3-tuple *{FromAddr, ToAddr, CallType}* for every trace from Google Bigquery APIs, which we subsequently aggregate into transactions. In total, we collect 172,709,392 transactions for all the 2,031,422 proxy contracts.
#### Step 1: extract transaction traces
We run the SQL to download all the traces related to all our proxy contracts.
* run *SELECT * FROM `bigquery-public-data.crypto_ethereum.traces` WHERE from_address IN ( SELECT trace FROM `moonlit-ceiling-399321.gmugcp.traces` ) or to_address IN ( SELECT trace FROM `moonlit-ceiling-399321.gmugcp.traces` ) ORDER BY transaction_hash*; in particular, *moonlit-ceiling-399321.gmugcp.traces* is the table consisting all the proxy contract addresses from *proxy-contract-prep/scripts-analyze/stats1/all_proxy.txt*.
* the total transaction traces cost around 1.3 TB storage, and we cannot upload all of them here. We choose a segment of the data and store it in "logic-contract-prep/data/sample.json"
* you can fetch all the data using the url *https://storage.googleapis.com/tracesdata/xxx.json*, where *xxx* starts from *000000000000* to *000000004429*.
#### Step 2: extract logic contracts
We aggregate the transaction traces into transactions and obtain the related logic contracts for every proxy contract, sorted by the timestamp (block number).
* run "analyze.py" under *logic-contract-prep/scripts-analyze* ---> generate all the results under *logic-contract-prep/scripts-analyze/impl.json*
* however, the impl.json costs 30GB, which is too large to be put here; therefore, we generate a sample *logic-contract-prep/scripts-analyze/sample_impl.json*
* also, you can fetch the whole *impl.json* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing)
## 3. Three Research Questions
#### RQ1 - Statistics
We do some statistics analysis of proxy contracts including bytecode duplication, transaction count and lifespan.
* Bytecode Duplication: run "iv_rq1_figure3.py" under *three-research-questions/rq1/script/* relies on "iv_rq1_figure3.csv" data file under *three-research-questions/rq1/data/* ---> generates figure 3 in the paper.
* Transaction Count: run "iv_rq1_figure4.py" under *three-research-questions/rq1/script/* relies on "iv_rq1_figure4.txt" data file under *three-research-questions/rq1/data/* ---> generates figure 4 in the paper.
* Lifespan: run "iv_rq1_figure5.py" under *three-research-questions/rq1/script/* relies on "iv_rq1_figure5.txt" data file under *three-research-questions/rq1/data/* ---> generates figure 5 in the paper.
#### RQ2 - Purposes
We conduct manual analysis to understand purpose of proxy contracts and categorized into four following types.
* Upgradeability: run "v_rq2_figure6.py" under *three-research-questions/rq2/script/* relies on "v_rq2_figure6.txt" data file under *three-research-questions/rq2/data/* ---> generates figure 6 in the paper.
* Extensibility: The 32 contracts that identified by the detection algorithm of extensibility proxies are listed in "extensibility_proxies.txt", among which one proxy, `0x4deca517d6817b6510798b7328f2314d3003abac`, is the vulnerability proxy with proxy-logic collision bug (labelled by "Audius Hack").
* Code-sharing: The file "code_sharing.txt" contains the 1,137,317 code-sharing proxies and 3,309 code-sharing proxy clusters that we identified.
* Code-hiding: The file "code_hiding.txt" contains the 1,213 code-hiding proxies that we identified. The first column in the csv file is the proxy address while the second column contains a list of tuple: `claimed logic address in EIP1967 slot`, `actual logic address in execution`, `the block where such discrepancy is observed`.
#### RQ3 - Bugs and Pitfalls
In RQ3 we conduct a semi-automated detection of bugs and pitfalls in proxies.
We leverage a set of automated helpers (as described in the paper) to help us prune non-vulnerable contracts before manual inspection.
The automated helpers can be found in `pitfall-detection-helpers` folder.
Note that the final results are obtained faithfully using manual inspection. The helper scripts are only used to data processing to reduce human efforts.
* Proxy-logic collision:
- the file "proxy_logic_collision.txt" shows the 32 proxies that we identified as well as our manual inspection results.
- the file "proxy_logic_collision_detector_evaluation_sampled_txs.txt" lists the 100 transactions sampled to evaluate the reliability of our automated helper which identifies storage slot read/write operations.
* Logic-logic collision:
- the file "logic_logic_collision.txt" contains the 15 proxies that we identified to have logic-logic collisions.
- the file "logic_logic_collision_detector_evaluation_sampled_contract_pairs.csv" lists the 100 new-version/old-version logic contract pairs sampled to evaluate the reliability of our automated helper to identify storage collisions between two logic contracts.
* Uninitialized contract:
- the file "uninitialized.csv" contains 183 proxies that was not initialized in the same transaction of deployment and may be at risk of front-running attack. Whether they are still exploitable (i.e., re-initialize by malicious actors at present) is also labelled in the csv.
- the file "identified_initialize_function_calldata.csv" lists the 100 logic contracts sampled to evaluate the quality of `initialize` calldata extracted by our automated helper.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is trash. Who in Austin makes it, who takes it, and where does it go?
Data ranges 2008-2016 and includes dropoff site, load id, time of load, type of load, weight of load, date, route number, and route type (recycling, street cleaning, garbage etc).
This dataset was created by Austin city government and hosted on Google Cloud Platform. You can use Kernels to analyze, share, and discuss this data on Kaggle, but if you’re looking for real-time updates and bigger data, check out the data on BigQuery, too
The Consumer Complaint Database is a collection of complaints about consumer financial products and services that we sent to companies for response. Complaints are published after the company responds, confirming a commercial relationship with the consumer, or after 15 days, whichever comes first. Complaints referred to other regulators, such as complaints about depository institutions with less than $10 billion in assets, are not published in the Consumer Complaint Database.This database is not a statistical sample of consumers’ experiences in the marketplace. Complaints are not necessarily representative of all consumers’ experiences and complaints do not constitute “information” for purposes of the Information Quality Act . Complaint volume should be considered in the context of company size and/or market share. For example, companies with more customers may have more complaints than companies with fewer customers. We encourage you to pair complaint data with public and private datasets for additional context. The Bureau publishes the consumer’s narrative description of his or her experience if the consumer opts to share it publicly and after the Bureau removes personal information. We don’t verify all the allegations in complaint narratives. Unproven allegations in consumer narratives should be regarded as opinion, not fact. We do not adopt the views expressed and make no representation that consumers’ allegations are accurate, clear, complete, or unbiased in substance or presentation. Users should consider what conclusions may be fairly drawn from complaints alone.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Citi Bike is the nation's largest bike share program, with 10,000 bikes and 600 stations across Manhattan, Brooklyn, Queens, and Jersey City. This dataset includes Citi Bike trips since Citi Bike launched in September 2013 and is updated daily. The data has been processed by Citi Bike to remove trips that are taken by staff to service and inspect the system, as well as any trips below 60 seconds in length, which are considered false starts. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Dados históricos de posição geográfica de veículos do BRT.
Dados completos disponíveis para consulta e download no data lake do data.rio. Os dados são capturados a cada minuto e tratados a cada hora. Dados sujeitos a alteração, como correções de buracos de captura e/ou ajustes de tratamento.
Como acessar Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato
CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar
aqui.
BigQuery
SELECT
*
FROM
`datario.transporte_rodoviario_municipal.gps_brt`
LIMIT
1000
Clique
aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência
com BigQuery, acesse
nossa documentação
para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.transporte_rodoviario_municipal.gps_brt` LIMIT
1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.transporte_rodoviario_municipal.gps_brt` LIMIT
1000"
)
Cobertura temporal 24/11/2021 até o momento
Frequência de atualização Horária
Órgão gestor Secretaria Municipal de Transportes (SMTR)
Colunas
Nome
Descrição
modo
BRT – nesta tabela consta apenas este modo
timestamp_gps
Timestamp de emissão do sinal de GPS
data
Data do timestamp de emissão do sinal de GPS
hora
Hora do timestamp de emissão do sinal de GPS
id_veiculo
Código identificador do veículo (número de ordem).
servico
Serviço realizado pelo veículo.
latitude
Parte da coordenada geográfica (eixo y) em graus decimais (EPSG:4326 -
WGS84)
longitude
Parte da coordenada geográfica (eixo x) em graus decimais (EPSG:4326 -
WGS84)
flag_em_movimento
Veículos com 'velocidade' abaixo da 'velocidade_limiar_parado', são
considerados como parado (false). Caso contrário, são considerados
andando (true)
tipo_parada
Identifica veículos parados em terminais ou garagens.
flag_linha_existe_sigmob
Flag de verificação se a linha informada existe no SIGMOB.
velocidade_instantanea
Velocidade instantânea do veículo, conforme informado pelo GPS
(km/h)
velocidade_estimada_10_min
Velocidade média nos últimos 10 minutos de operação (km/h)
distancia
Distância da última posição do GPS em relação à posição atual (m)
versao
Código de controle de versão do dado (SHA Github)
Dados do(a) publicador(a)
Nome:
Subsecretaria de Tecnologia em Transportes (SUBTT)
E-mail:
dados.smtr@prefeitura.rio
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Taxa de precipitação estimada de áreas do sudeste brasileiro. As estimativas são feitas de hora em hora, cada registro contendo dados desta estimativa. Cada área é um quadrado formado por 4km de lado. Dados coletados pelo satélite GOES-16.
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.meio_ambiente_clima.taxa_precipitacao_satelite`
LIMIT
1000
Clique aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
)
Cobertura temporal
Desde 2020 até a data corrente
Frequência de atualização
Diário
Órgão gestor
Centro de Operações da Prefeitura do Rio (COR)
Colunas
Nome
Descrição
latitude
Latitude do centro da área.
longitude
Longitude do centro da área.
rrqpe
Taxa de precipitação estimada, medidas em milímetros por hora.
primary_key
Chave primária criada a partir da concatenação da coluna data, horário, latitude e longitude. Serve para evitar dados duplicados.
horario
Horário no qual foi realizada a medição
data_particao
Data na qual foi realizada a medição
Dados do publicador
Nome: Patrícia Catandi
E-mail: patriciabcatandi@gmail.com
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Dados sobre as estações pluviométricas do alertario ( Sistema Alerta Rio da Prefeitura do Rio de Janeiro ) na cidade do Rio de Janeiro.
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou,
para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.meio_ambiente_clima.estacoes_alertario`
LIMIT
1000
Clique aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.meio_ambiente_clima.estacoes_alertario` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.meio_ambiente_clima.estacoes_alertario` LIMIT 1000"
)
Cobertura temporal
N/A
Frequência de atualização
Anual
Órgão gestor
COR
Colunas
Nome
Descrição
x
X UTM (SAD69 Zona 23)
longitude
Longitude onde a estação se encontra.
id_estacao
ID da estação definido pelo AlertaRIO.
estacao
Nome da estação.
latitude
Latitude onde a estação se encontra.
cota
Altura em metros onde a estação se encontra.
endereco
Endereço completo da estação.
situacao
Indica se a estação está operante ou com falha.
data_inicio_operacao
Data em que a estação começou a operar.
data_fim_operacao
Data em que a estação parou de operar.
data_atualizacao
Última data em que os dados sobre a data de operação foram atualizados.
y
Y UTM (SAD69 Zona 23)
Dados do publicador
Nome: Patricia Catandi
E-mail: patriciabcatandi@gmail.com
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Dados sobre as estações meteorológicas do inmet ( Instituto Nacional de Meteorologia ) na cidade do Rio de Janeiro.
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.meio_ambiente_clima.estacoes_inmet`
LIMIT
1000
Clique aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.meio_ambiente_clima.estacoes_inmet` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.meio_ambiente_clima.estacoes_inmet` LIMIT 1000"
)
Cobertura temporal
N/A
Frequência de atualização
Nunca
Órgão gestor
INMET
Colunas
Nome
Descrição
id_municipio
Código do município do IBGE de 7 dígitos.
latitude
Latitude onde a estação se encontra.
data_inicio_operacao
Data em que a estação começou a operar.
data_fim_operacao
Data em que a estação parou de operar.
situacao
Indica se a estação está operante ou com falha.
tipo_estacao
Indica se a estação é automática ou manual. Pode conter nulos.
entidade_responsavel
Entidade responsável pela estação.
data_atualizacao
Última data em que os dados sobre a data de operação foram atualizados.
longitude
Longitude onde a estação se encontra.
sigla_uf
Sigla do estado.
id_estacao
ID da estação definido pelo INMET.
nome_estacao
Nome da estação.
Dados do publicador
Nome: Patricia Catandi
E-mail: patriciabcatandi@gmail.com
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Chamados feitos ao 1746. São chamados desde março de 2011, quando começou o projeto 1746.
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.administracao_servicos_publicos.chamado_1746`
LIMIT
1000
Clique aqui para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.administracao_servicos_publicos.chamado_1746` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.administracao_servicos_publicos.chamado_1746` LIMIT 1000"
)
Cobertura temporal
Março de 2011
Frequência de atualização
Diário
Órgão gestor
SEGOVI
Colunas
Nome
Descrição
id_chamado
Identificador único do chamado no banco de dados.
data_inicio
Data de abertura do chamado. Ocorre quando o operador registra o chamado.
data_fim
Data de fechamento do chamado. O chamado é fechado quando o pedido é atendido ou quando se percebe que o pedido não pode ser atendido.
id_bairro
Identificador único, no banco de dados, do bairro onde ocorreu o fato que gerou o chamado.
id_territorialidade
Identificador único, no banco de dados, da territorialidade onde ocorreu o fato que gerou o chamado. Territorialidade é uma região da cidade do Rio de Janeiro que tem com responsável um órgão especifico. Exemplo: CDURP, que é responsável pela região do porto do Rio de Janeiro.
id_logradouro
Identificador único, no banco de dados, do logradouro onde ocorreu o fato que gerou o chamado.
numero_logradouro
Número da porta onde ocorreu o fato que gerou o chamado.
id_unidade_organizacional
Identificador único, no banco de dados, do órgão que executa o chamado. Por exemplo: identificador da COMLURB quando o chamado é relativo a limpeza urbana.
nome_unidade_organizacional
Nome do órgão que executa a demanda. Por exemplo: COMLURB quando a demanda é relativa a limpeza urbana.
unidade_organizadional_ouvidoria
Booleano indicando se o chamado do cidadão foi feita Ouvidoria ou não. 1 caso sim, 0 caso não,
categoria
Categoria do chamado. Exemplo: Serviço, informação, sugestão, elogio, reclamação, crítica.
id_tipo
Identificador único, no banco de dados, do tipo do chamado. Ex: Iluminação pública.
tipo
Nome do tipo do chamado. Ex: Iluminação pública.
id_subtipo
Identificador único, no banco de dados, do subtipo do chamado. Ex: Reparo de lâmpada apagada.
subtipo
Nome do subtipo do chamado. Ex: Reparo de lâmpada apagada.
status
Status do chamado. Ex. Fechado com solução, aberto em andamento, pendente etc.
longitude
Longitude do lugar do evento que motivou o chamado.
latitude
Latitude do lugar do evento que motivou o chamado.
data_alvo_finalizacao
Data prevista para o atendimento do chamado. Caso prazo_tipo seja D fica em branco até o diagnóstico ser feito.
data_alvo_diagnostico
Data prevista para fazer o diagnóstico do serviço. Caso prazo_tipo seja F esta data fica em branco.
data_real_diagnostico
Data em que foi feito o diagnóstico do serviço. Caso prazo_tipo seja F esta data fica em branco.
tempo_prazo
Prazo para o serviço ser feito. Em dias ou horas após a abertura do chamado. Caso haja diagnóstico o prazo conta após se fazer o diagnóstico.
prazo_unidade
Unidade de tempo utilizada no prazo. Dias ou horas. D ou H.
prazo_tipo
Diagnóstico ou finalização. D ou F. Indica se a chamada precisa de diagnóstico ou não. Alguns serviços precisam de avaliação para serem feitos, neste caso é feito o diagnóstico. Por exemplo, pode de árvore. Há a necessidade de um engenheiro ambiental verificar a necessidade da poda ou não.
id_unidade_organizacional_mae
ID da unidade organizacional mãe do orgão que executa a demanda. Por exemplo: "CVA - Coordenação de Vigilância de Alimentos" é quem executa a demanda e obede a unidade organizacional mãe "IVISA-RIO - Instituto Municipal de Vigilância Sanitária, de Zoonoses e de Inspeção Agropecuária". A coluna se refere ao ID deste último.
situacao
Identifica se o chamado foi encerrado
tipo_situacao
Indica o status atual do chamado entre as categorias Atendido, Atendido parcialmente, Não atendido, Não constatado e Andamento
dentro_prazo
Indica se a data alvo de finalização do chamado ainda está dentro do prazo estipulado.
justificativa_status
Justificativa que os órgãos usam ao definir o status. Exemplo: SEM POSSIBILIDADE DE ATENDIMENTO - justificativa: Fora de área de atuação do municipio
reclamacoes
Quantidade de reclamações.
Dados do(a) publicador(a)
Nome: Patricia Catandi
E-mail: patriciabcatandi@gmail.com
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Dados completos disponíveis para consulta e download no data lake do data.rio. Os dados são capturados a cada minuto e tratados a cada hora. Dados sujeitos a alteração, como correções de buracos de captura e/ou ajustes de tratamento.
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.transporte_rodoviario_municipal.gps_onibus`
LIMIT
1000
Clique aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.transporte_rodoviario_municipal.gps_onibus` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.transporte_rodoviario_municipal.gps_onibus` LIMIT 1000"
)
Cobertura temporal
01/03/2021 até o momento
Frequência de atualização
Horária
Órgão gestor
Secretaria Municipal de Transportes
Colunas
Nome
Descrição
modo
SPPO – nesta tabela consta apenas este modo
timestamp_gps
Timestamp de emissão do sinal de GPS
data
Data do timestamp de emissão do sinal de GPS
hora
Hora do timestamp de emissão do sinal de GPS
id_veiculo
Código identificador do veículo (número de ordem).
servico
Serviço realizado pelo veículo.
latitude
Parte da coordenada geográfica (eixo y) em graus decimais (EPSG:4326 - WGS84)
longitude
Parte da coordenada geográfica (eixo x) em graus decimais (EPSG:4326 - WGS84)
flag_em_movimento
Veículos com 'velocidade' abaixo da 'velocidade_limiar_parado', são considerados como parado (false). Caso contrário, são considerados andando (true)
tipo_parada
Identifica veículos parados em terminais ou garagens.
flag_linha_existe_sigmob
Flag de verificação se a linha informada existe no SIGMOB.
velocidade_instantanea
Velocidade instantânea do veículo, conforme informado pelo GPS (km/h)
velocidade_estimada_10_min
Velocidade média nos últimos 10 minutos de operação (km/h)
distancia
Distância da última posição do GPS em relação à posição atual (m)
fonte_gps
Fornecedor dos dados de GPS (zirix ou conecta)
versao
Código de controle de versão do dado (SHA Github)
Dados do(a) publicador(a)
Nome: Subsecretaria de Tecnologia em Transportes (SUBTT)
E-mail: dados.smtr@prefeitura.rio
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Ocorrências disparadas pelo COR desde 2015. Uma ocorrência na cidade do Rio de Janeiro é um acontecimento que exije um acompanhamento e, na maioria das vezes, uma ação da PCRJ. Por exemplo, Buraco na pista, bolsão d'água, enguiço mecânico. Uma ocorrência aberta é uma ocorrência que ainda não foi solucionada. Acesse também através da API do Escritório de Dados: https://api.dados.rio/v1/
Como acessar
Nessa página
Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
BigQuery
SELECT
*
FROM
`datario.adm_cor_comando.ocorrencias`
LIMIT
1000
Clique aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
acesse nossa documentação para entender como acessar os dados.
Python
import
basedosdados
as
bd
# Para carregar o dado direto no pandas
df
=
bd.read_sql
(
"SELECT * FROM `datario.adm_cor_comando.ocorrencias` LIMIT 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)
R
install.packages(
"basedosdados"
)
library(
"basedosdados"
)
# Defina o seu projeto no Google Cloud
set_billing_id(
"<id_do_seu_projeto_gcp>"
)
# Para carregar o dado direto no R
tb <- read_sql(
"SELECT * FROM `datario.adm_cor_comando.ocorrencias` LIMIT 1000"
)
Cobertura temporal
Não informado.
Frequência de atualização
Diário
Órgão gestor
COR
Colunas
Nome
Descrição
data_inicio
Data e hora do registro do evento na PCRJ.
data_fim
Data e hora do encerramento do evento na PCRJ. O evento é encerrado quando é solucionado. Este atributo está vazio quanto o evento está aberto.
bairro
Bairro onde ocorreu o evento.
id_pop
Identificador do POP.
status
Status do evento (ABERTO, FECHADO).
gravidade
Gravidade do evento (BAIXO, MEDIO, ALTO, CRITICO).
prazo
Prazo esperado de solução do evento (CURTO, MEDIO(acima de 3 dias), LONGO( acima de 5 dias)).
latitude
Latitude em formato WGS-84 em que ocorreu o evento
longitude
Longitude em formato WGS-84 em que ocorreu o evento
id_evento
Identificador do evento.
descricao
Descrição do evento.
tipo
Tipo do evento (PRIMARIO, SECUNDARIO)
Dados do(a) publicador(a)
Nome: Patrícia Catandi
E-mail: patriciabcatandi@gmail.com
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.
The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents
For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/
“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.
Banner photo by Helloquence on Unsplash