21 datasets found

Google Patents Public Data
kaggle.com
zip
Updated Sep 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/bigquery/patents
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

Content

The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

Acknowledgements

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

Banner photo by Helloquence on Unsplash
Google Analytics Sample
console.cloud.google.com
Updated Jul 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&inv=1&invt=AbzttQ (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data
Explore at:
Dataset updated
Jul 15, 2017
Dataset provided by
Googlehttp://google.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Intellectual Property Investigations by the USITC
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Intellectual Property Investigations by the USITC [Dataset]. https://www.kaggle.com/bigquery/usitc-investigations
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

Section 337, Tariff Act of 1930, Investigations of Unfair Practices in Import Trade. Under section 337, the USITC determines whether there is unfair competition in the importation of products into, or their subsequent sale in, the United States. Section 337 prohibits the importation into the US , or the sale of such articles by owners, importers or consignees, of articles which infringe a patent, copyright, trademark, or semiconductor mask work, or where unfair competition or unfair acts exist that can destroy or substantially injure a US industry or prevent one from developing, or restrain or monopolize trade in US commerce. These latter categories are very broad: unfair competition can involve counterfeit, mismarked or misbranded goods, where the sale of the goods are at unfairly low prices, where other antitrust violations take place such as price fixing, market division or the goods violate a standard applicable to such goods.

Content

US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.

Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

Acknowledgements

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:usitc_investigations

"US International Trade Commission 337Info Unfair Import Investigations Information System" by the USITC, for public use.

Banner photo by João Silas on Unsplash
Data from: San Francisco Open Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasets/datasf/san-francisco
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
DataSF
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
San Francisco
Description
Context

DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

https://datasf.org/about/

Content

This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.

This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.

This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).

This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

http://datasf.org/

https://cloud.google.com/bigquery/public-data/sfo-311

https://cloud.google.com/bigquery/public-data/sffd-service-calls

https://cloud.google.com/bigquery/public-data/sfpd-reports

https://cloud.google.com/bigquery/public-data/sfo-trees

Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @meric from Unplash.

Inspiration

Which neighborhoods have the highest proportion of offensive graffiti?

Which complaint is most likely to be made using Twitter and in which neighborhood?

What are the most complained about Muni stops in San Francisco?

What are the top 10 incident types that the San Francisco Fire Department responds to?

How many medical incidents and structure fires are there in each neighborhood?

What’s the average response time for each type of dispatched vehicle?

Which category of police incidents have historically been the most common in San Francisco?

What were the most common police incidents in the category of LARCENY/THEFT in 2016?

Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

What is the average tree diameter?

What is the highest number of a particular species of tree planted in a single year?

Which San Francisco locations feature the largest number of trees?
gnomAD
console.cloud.google.com
Updated Jun 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Broad%20Institute%20of%20MIT%20and%20Harvard&hl=fr&inv=1&invt=Ab0asQ (2021). gnomAD [Dataset]. https://console.cloud.google.com/marketplace/product/broad-institute/gnomad?hl=fr
Explore at:
Dataset updated
Jun 1, 2021
Dataset provided by
Googlehttp://google.com/
Description
The Genome Aggregation Database (gnomAD) is maintained by an international coalition of investigators to aggregate and harmonize data from large-scale sequencing projects. These public datasets are available in VCF format in Google Cloud Storage and in Google BigQuery as integer range partitioned tables . Each dataset is sharded by chromosome meaning variants are distributed across 24 tables (indicated with “_chr*” suffix). Utilizing the sharded tables reduces query costs significantly. Variant Transforms was used to process these VCF files and import them to BigQuery. VEP annotations were parsed into separate columns for easier analysis using Variant Transforms’ annotation support . These public datasets are included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Find out more in our blog post, Providing open access to gnomAD on Google Cloud . Questions? Contact gcp-life-sciences-discuss@googlegroups.com.
MLB 2016 Pitch-by-Pitch
console.cloud.google.com
Updated Jul 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Sportradar&inv=1&invt=Ab1TUA (2020). MLB 2016 Pitch-by-Pitch [Dataset]. https://console.cloud.google.com/marketplace/product/sportradar-public-data/mlb-pitch-by-pitch
Explore at:
Dataset updated
Jul 5, 2020
Dataset provided by
Sportradarhttp://sportradar.com/
Googlehttp://google.com/
Description
This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. This dataset contains the following tables: games_wide (every pitch, steal, or lineup event for each at bat in the 2016 regular season), games_post_wide(every pitch, steal, or lineup event for each at-bat in the 2016 post season), and schedules ( the schedule for every team in the regular season). The schemas for the games_wide and games_post_wide tables are identical. With this data you can effectively replay a game and rebuild basic statistics for players and teams. Note: This data was built via a denormalization process over raw game log files which may contain scoring errors and in some cases missing data. For official scoring and statistical information please consult mlb.com , baseball-reference.com , or sportradar.com . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Open Images
kaggle.com
opendatalab.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Open Images [Dataset]. https://www.kaggle.com/datasets/bigquery/open-images
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

Labeled datasets are useful in machine learning research.

Content

This public dataset contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.

Tables: 1) annotations_bbox 2) dict 3) images 4) labels

Update Frequency: Quarterly

Querying BigQuery Tables

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:open_images

https://cloud.google.com/bigquery/public-data/openimages

APA-style citation: Google Research (2016). The Open Images dataset [Image urls and labels]. Available from github: https://github.com/openimages/dataset.

Use: The annotations are licensed by Google Inc. under CC BY 4.0 license.

The images referenced in the dataset are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

Banner Photo by Mattias Diesel from Unsplash.

Inspiration

Which labels are in the dataset? Which labels have "bus" in their display names? How many images of a trolleybus are in the dataset? What are some landing pages of images with a trolleybus? Which images with cherries are in the training set?
d
Company Data | Global Coverage | 65M+ Company profiles | Bi-weekly updates
datarade.ai
.json, .csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Forager.ai, Company Data | Global Coverage | 65M+ Company profiles | Bi-weekly updates [Dataset]. https://datarade.ai/data-products/b2b-company-data-worldwide-61m-records-verified-updated-forager-ai-351c
Explore at:
.json, .csvAvailable download formats
Dataset provided by
Forager.ai
Area covered
United States Minor Outlying Islands, Iran (Islamic Republic of), Guatemala, United States of America, Faroe Islands, Sint Maarten (Dutch part), Madagascar, Tunisia, Kyrgyzstan, Falkland Islands (Malvinas)
Description
Global B2B Company Database | 65M+ Verified Firms | Firmographics Forget stale corporate directories – Forager.ai delivers living, breathing company intelligence trusted by VCs, Fortune 500 teams, and SaaS leaders. Our 65 million+ AI-validated company profiles are refreshed every 14 days to track leadership changes, tech migrations, and growth signals competitors miss.

Why This Outperforms Generic Firmographics ✅ AI That Works Like Your Best Analyst Cross-references 12+ sources to: ✔ Flag companies hiring sales teams → Ready to buy ✔ Detect tech stack changes → Migration opportunities ✔ Identify layoffs/expansions → Timely outreach windows

✅ Freshness That Matters We update 100% of records every 2-3 weeks – critical for tracking:

Funding round and revenue.

Company job posts

✅ Ethical & Audit-Ready Full GDPR/CCPA compliance with:

Usage analytics dashboard

Your Secret Weapon for: 🔸 Sales Teams: → Identify high-growth targets 83% faster (employee growth + tech stack filters) → Prioritize accounts with "hiring spree" or "new funding" tags

🔸 Investors: → Track 18K+ private companies with revenue/employee alerts → Portfolio monitoring with 92% prediction accuracy on revenue shifts

🔸 Marketers: → ABM campaigns powered by technographics (Slack → Teams migrators) → Event targeting using travel patterns (HQ → conference city matches)

🔸 Data Teams: → Enrich Snowflake/Redshift warehouses via API → Build custom models with 150+ firmographic/technographic fields

Core Data Points ✔ Financial Health: Revenue ranges, funding history, growth rate estimates ✔ Tech Stack: CRM, cloud platforms, marketing tools, Web technologies used. ✔ People Moves: C-suite, Employees headcount ✔ Expansion Signals: New offices, job postings.

Enterprise-Grade Delivery

API: Credits system to find company using any field in schema; returns name, domain, industry, headcount, location, LinkedIn etc.

Cloud Sync: Auto-update Snowflake/Redshift/BigQuery

CRM Push: Direct to Salesforce/HubSpot/Pipedrive

Flat Files: CSV/JSON

Why Clients Never Go Back to Legacy Providers → 6-Month ROI Guarantee – We’ll beat your current vendor or extend your plan → Free Data Audit – Upload your CRM list → We’ll show gaps/opportunities → Live Training – Our analysts teach you to mine hidden insights

Keywords (Naturally Integrated): Global Company Data | Firmographic Database | B2B Technographic data | Private Company Intelligence | CRM Enrichment API | Sales Lead Database | VC Due Diligence Data | AI-Validated Firmographics | Market Expansion Signals | Competitor Benchmarking
p
MIMIC-IV
physionet.org
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
Explore at:
Unique identifier
https://doi.org/10.13026/kpb9-mt58
Dataset updated
Oct 11, 2024
Authors
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
An Empirical Study of Proxy Smart Contracts at the Ethereum Ecosystem Scale
zenodo.org
pdf, zip
Updated Dec 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wuqi Zhang; Wuqi Zhang (2024). An Empirical Study of Proxy Smart Contracts at the Ethereum Ecosystem Scale [Dataset]. http://doi.org/10.5281/zenodo.14566032
Explore at:
pdf, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14566032
Dataset updated
Dec 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wuqi Zhang; Wuqi Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
# An Empirical Study of Proxy Smart Contracts at Ethereum Ecosystem Scale

In this work, we conduct the first comprehensive study on Ethereum proxies. We organize our data and code into three sections as follows, aligning with the structure of our paper.

* **1. Proxy Contract Preparation.** To collect a comprehensive dataset of proxies, we propose *ProxyEx*, the first framework designed to detect proxy directly from bytecode.
* **2. Logic Contract Preparation.** To analyze the logic contracts of proxies, we extract transactions and traces for extracting logic contracts from all the related proxies.
* **3. Three Research Questions.** In this paper, we conduct the first systematic study on proxies on Ethereum, aiming to answer the following research questions.
* RQ1: Statistics. How many proxies are there on Ethereum? How often do proxies modify their logic? How many transactions are executed on proxies?
* RQ2: Purpose. What are the major purposes of implementing proxy patterns for smart contracts?
* RQ3: Bugs and Pitfalls. What types of bugs and pitfalls can exist in proxies?

## 1. Proxy Contract Preparation
To facilitate proxy contract data collection, we design a system, *ProxyEx*, to detect proxy contracts from contract bytecode.

#### Environment Setup
First make sure you have the following things installed on your system:

* Boost libraries (Can be installed on Debian with apt install libboost-all-dev)

* Python 3.8

* Souffle 2.3 or 2.4

Now install the Souffle custom functors and then You should now be ready to run *ProxyEx*.

* run *cd proxy-contract-prep/proxyex/souffle-addon$ && make*

#### Step 1: unzip the contracts for getting bytecode
We collect all the on-chain smart contract bytecode as of September 10, 2023. In total, we have *62,578,635* smart contracts.

* download *contracts.zip* under *proxyex* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing) and place it under *proxy-contract-prep*
* run *unzip contracts.zip* under *proxy-contract-prep* ---> generate the contract bytecode under *proxy-contract-prep/contracts*

#### Step 2: run the proxy detection script in parallel
To speed up the detection process, we optimize it by running multiple python scripts.

* run *bash proxy.sh* under *proxy-contract-prep/scripts-run* ---> generate all the results under *proxy-contract-prep/version1*
* run *bash kill.sh* under *proxy-contract-prep/scripts-run* ---> kill all the running scripts

#### Step 3: analyze all the proxy detection results
We apply *ProxyEx* on the smart contract bytecode with a timeout of *60* seconds; there are *2,031,422* proxy addresses in total (3.25\%). The average detection time of proxy and non-proxy contracts are *14.85* seconds and *3.88* seconds, respectively.

* download *first_contract.json* under *proxyex* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing) and place it under *proxy-contract-prep/scripts-analyze*
* run *python3 analyze_proxy.py* under *proxy-contract-prep/scripts-analyze* ---> generate all the results under *proxy-contract-prep/scripts-analyze/stats1* for later analysis.
* we have already uploaded our analysis results into the *stats1.zip* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing); run *unzip stats1.zip* under under *proxy-contract-prep/scripts-analyze*, you will get some results such as *all_proxy.txt* lists all the 2,031,422 proxy addresses.

#### Step 4: manually analyze 1k contracts for accuracy
To evaluate its effectiveness and performance, we randomly sampled 1,000 contracts from our dataset. Our examination revealed *548* proxy addresses and *452* non-proxy addresses.
*ProxyEx* misclassified one proxy as non-proxy (false negative), indicating that our framework achieves *100\%* precision and over *99\%* recall.

* *proxy-contract-prep/1k.csv* displays our manually checked results of 1,000 randomly sampled contracts

## 2. Logic Contract Preparation
To extract logic contract addresses, we gather all the transaction traces associated with a *DELEGATECALL* sent from the proxy contracts. We collect a 3-tuple *{FromAddr, ToAddr, CallType}* for every trace from Google Bigquery APIs, which we subsequently aggregate into transactions. In total, we collect 172,709,392 transactions for all the 2,031,422 proxy contracts.

#### Step 1: extract transaction traces
We run the SQL to download all the traces related to all our proxy contracts.

* run *SELECT * FROM `bigquery-public-data.crypto_ethereum.traces` WHERE from_address IN ( SELECT trace FROM `moonlit-ceiling-399321.gmugcp.traces` ) or to_address IN ( SELECT trace FROM `moonlit-ceiling-399321.gmugcp.traces` ) ORDER BY transaction_hash*; in particular, *moonlit-ceiling-399321.gmugcp.traces* is the table consisting all the proxy contract addresses from *proxy-contract-prep/scripts-analyze/stats1/all_proxy.txt*.
* the total transaction traces cost around 1.3 TB storage, and we cannot upload all of them here. We choose a segment of the data and store it in "logic-contract-prep/data/sample.json"
* you can fetch all the data using the url *https://storage.googleapis.com/tracesdata/xxx.json*, where *xxx* starts from *000000000000* to *000000004429*.

#### Step 2: extract logic contracts
We aggregate the transaction traces into transactions and obtain the related logic contracts for every proxy contract, sorted by the timestamp (block number).

* run "analyze.py" under *logic-contract-prep/scripts-analyze* ---> generate all the results under *logic-contract-prep/scripts-analyze/impl.json*
* however, the impl.json costs 30GB, which is too large to be put here; therefore, we generate a sample *logic-contract-prep/scripts-analyze/sample_impl.json*
* also, you can fetch the whole *impl.json* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing)

## 3. Three Research Questions
#### RQ1 - Statistics
We do some statistics analysis of proxy contracts including bytecode duplication, transaction count and lifespan.
* Bytecode Duplication: run "iv_rq1_figure3.py" under *three-research-questions/rq1/script/* relies on "iv_rq1_figure3.csv" data file under *three-research-questions/rq1/data/* ---> generates figure 3 in the paper.
* Transaction Count: run "iv_rq1_figure4.py" under *three-research-questions/rq1/script/* relies on "iv_rq1_figure4.txt" data file under *three-research-questions/rq1/data/* ---> generates figure 4 in the paper.
* Lifespan: run "iv_rq1_figure5.py" under *three-research-questions/rq1/script/* relies on "iv_rq1_figure5.txt" data file under *three-research-questions/rq1/data/* ---> generates figure 5 in the paper.

#### RQ2 - Purposes
We conduct manual analysis to understand purpose of proxy contracts and categorized into four following types.
* Upgradeability: run "v_rq2_figure6.py" under *three-research-questions/rq2/script/* relies on "v_rq2_figure6.txt" data file under *three-research-questions/rq2/data/* ---> generates figure 6 in the paper.
* Extensibility: The 32 contracts that identified by the detection algorithm of extensibility proxies are listed in "extensibility_proxies.txt", among which one proxy, `0x4deca517d6817b6510798b7328f2314d3003abac`, is the vulnerability proxy with proxy-logic collision bug (labelled by "Audius Hack").
* Code-sharing: The file "code_sharing.txt" contains the 1,137,317 code-sharing proxies and 3,309 code-sharing proxy clusters that we identified.
* Code-hiding: The file "code_hiding.txt" contains the 1,213 code-hiding proxies that we identified. The first column in the csv file is the proxy address while the second column contains a list of tuple: `claimed logic address in EIP1967 slot`, `actual logic address in execution`, `the block where such discrepancy is observed`.

#### RQ3 - Bugs and Pitfalls

In RQ3 we conduct a semi-automated detection of bugs and pitfalls in proxies.
We leverage a set of automated helpers (as described in the paper) to help us prune non-vulnerable contracts before manual inspection.
The automated helpers can be found in `pitfall-detection-helpers` folder.
Note that the final results are obtained faithfully using manual inspection. The helper scripts are only used to data processing to reduce human efforts.

* Proxy-logic collision:
- the file "proxy_logic_collision.txt" shows the 32 proxies that we identified as well as our manual inspection results.
- the file "proxy_logic_collision_detector_evaluation_sampled_txs.txt" lists the 100 transactions sampled to evaluate the reliability of our automated helper which identifies storage slot read/write operations.
* Logic-logic collision:
- the file "logic_logic_collision.txt" contains the 15 proxies that we identified to have logic-logic collisions.
- the file "logic_logic_collision_detector_evaluation_sampled_contract_pairs.csv" lists the 100 new-version/old-version logic contract pairs sampled to evaluate the reliability of our automated helper to identify storage collisions between two logic contracts.
* Uninitialized contract:
- the file "uninitialized.csv" contains 183 proxies that was not initialized in the same transaction of deployment and may be at risk of front-running attack. Whether they are still exploitable (i.e., re-initialize by malicious actors at present) is also labelled in the csv.
- the file "identified_initialize_function_calldata.csv" lists the 100 logic contracts sampled to evaluate the quality of `initialize` calldata extracted by our automated helper.
Austin Waste and Diversion
kaggle.com
Updated Aug 25, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Boysen (2017). Austin Waste and Diversion [Dataset]. https://www.kaggle.com/datasets/jboysen/austin-waste/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jacob Boysen
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Austin
Description
Context:

This dataset is trash. Who in Austin makes it, who takes it, and where does it go?

Content:

Data ranges 2008-2016 and includes dropoff site, load id, time of load, type of load, weight of load, date, route number, and route type (recycling, street cleaning, garbage etc).

Acknowledgements:

This dataset was created by Austin city government and hosted on Google Cloud Platform. You can use Kernels to analyze, share, and discuss this data on Kaggle, but if you’re looking for real-time updates and bigger data, check out the data on BigQuery, too

Inspiration:

How much trash is Austin generating?

Which are the trashiest routes? Who recycles the best?

Any seasonal changes?

Try to predict trash route usage from historical trash data
CFPB Consumer Complaint Database
console.cloud.google.com
Updated Jan 25, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Consumer%20Financial%20Protection%20Bureau&inv=1&invt=Ab1YPQ (2020). CFPB Consumer Complaint Database [Dataset]. https://console.cloud.google.com/marketplace/product/cfpb/complaint-database
Explore at:
Dataset updated
Jan 25, 2020
Dataset provided by
Googlehttp://google.com/
Description
The Consumer Complaint Database is a collection of complaints about consumer financial products and services that we sent to companies for response. Complaints are published after the company responds, confirming a commercial relationship with the consumer, or after 15 days, whichever comes first. Complaints referred to other regulators, such as complaints about depository institutions with less than $10 billion in assets, are not published in the Consumer Complaint Database.This database is not a statistical sample of consumers’ experiences in the marketplace. Complaints are not necessarily representative of all consumers’ experiences and complaints do not constitute “information” for purposes of the Information Quality Act . Complaint volume should be considered in the context of company size and/or market share. For example, companies with more customers may have more complaints than companies with fewer customers. We encourage you to pair complaint data with public and private datasets for additional context. The Bureau publishes the consumer’s narrative description of his or her experience if the consumer opts to share it publicly and after the Bureau removes personal information. We don’t verify all the allegations in complaint narratives. Unproven allegations in consumer narratives should be regarded as opinion, not fact. We do not adopt the views expressed and make no representation that consumers’ allegations are accurate, clear, complete, or unbiased in substance or presentation. Users should consider what conclusions may be fairly drawn from complaints alone.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
NYC Citi Bike Trips
console.cloud.google.com
Updated Jul 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:City%20of%20New%20York&inv=1&invt=Ab1CqA (2022). NYC Citi Bike Trips [Dataset]. https://console.cloud.google.com/marketplace/product/city-of-new-york/nyc-citi-bike
Explore at:
Dataset updated
Jul 1, 2022
Dataset provided by
Googlehttp://google.com/
Area covered
New York
Description
Citi Bike is the nation's largest bike share program, with 10,000 bikes and 600 stations across Manhattan, Brooklyn, Queens, and Jersey City. This dataset includes Citi Bike trips since Citi Bike launched in September 2013 and is updated daily. The data has been processed by Citi Bike to remove trips that are taken by staff to service and inspect the system, as well as any trips below 60 seconds in length, which are considered false starts. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Transporte Rodoviário: Histórico de GPS do BRT

data.rio
hub.arcgis.com

Updated Jun 8, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Prefeitura da Cidade do Rio de Janeiro (2022). Transporte Rodoviário: Histórico de GPS do BRT [Dataset]. https://www.data.rio/documents/a17608e589864376bfad313e026c4681

Explore at:

Dataset updated

Jun 8, 2022

Dataset authored and provided by

Prefeitura da Cidade do Rio de Janeiro

License

Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically

Description

Dados históricos de posição geográfica de veículos do BRT.

Dados completos disponíveis para consulta e download no data lake do data.rio. Os dados são capturados a cada minuto e tratados a cada hora. Dados sujeitos a alteração, como correções de buracos de captura e/ou ajustes de tratamento.

Como acessar Nessa página

Aqui, você encontrará um botão para realizar o download dos dados em formato
CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar
aqui.

BigQuery

 SELECT 
 * 
 FROM 

 `datario.transporte_rodoviario_municipal.gps_brt`

 LIMIT 
 1000 


Clique
aqui
para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência
com BigQuery, acesse
nossa documentação
para entender como acessar os dados.

Python

import
basedosdados
as
bd


# Para carregar o dado direto no pandas

df
=
bd.read_sql
(
"SELECT * FROM `datario.transporte_rodoviario_municipal.gps_brt` LIMIT
 1000"
,
billing_project_id
=
"<id_do_seu_projeto_gcp>"
)

install.packages(
"basedosdados"
)

library(
"basedosdados"
)


# Defina o seu projeto no Google Cloud

set_billing_id(
"<id_do_seu_projeto_gcp>"
)


# Para carregar o dado direto no R

tb <- read_sql(
"SELECT * FROM `datario.transporte_rodoviario_municipal.gps_brt` LIMIT
 1000"
)

Cobertura temporal 24/11/2021 até o momento

Frequência de atualização Horária

Órgão gestor Secretaria Municipal de Transportes (SMTR)

Colunas

   Nome


   Descrição




   modo


   BRT – nesta tabela consta apenas este modo




   timestamp_gps


   Timestamp de emissão do sinal de GPS




   data


   Data do timestamp de emissão do sinal de GPS




   hora


   Hora do timestamp de emissão do sinal de GPS




   id_veiculo


   Código identificador do veículo (número de ordem).




   servico


   Serviço realizado pelo veículo.




   latitude


   Parte da coordenada geográfica (eixo y) em graus decimais (EPSG:4326 -
   WGS84)




   longitude


   Parte da coordenada geográfica (eixo x) em graus decimais (EPSG:4326 -
   WGS84)




   flag_em_movimento


   Veículos com 'velocidade' abaixo da 'velocidade_limiar_parado', são
   considerados como parado (false). Caso contrário, são considerados
   andando (true)




   tipo_parada


   Identifica veículos parados em terminais ou garagens.




   flag_linha_existe_sigmob


   Flag de verificação se a linha informada existe no SIGMOB.




   velocidade_instantanea


   Velocidade instantânea do veículo, conforme informado pelo GPS
   (km/h)




   velocidade_estimada_10_min


   Velocidade média nos últimos 10 minutos de operação (km/h)




   distancia


   Distância da última posição do GPS em relação à posição atual (m)




   versao


   Código de controle de versão do dado (SHA Github)

Dados do(a) publicador(a)

Nome:
Subsecretaria de Tecnologia em Transportes (SUBTT)
E-mail:
dados.smtr@prefeitura.rio

Meio Ambiente: Taxa de Precipitação (GOES-16)

data.rio
hub.arcgis.com

Updated Jun 2, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Prefeitura da Cidade do Rio de Janeiro (2022). Meio Ambiente: Taxa de Precipitação (GOES-16) [Dataset]. https://www.data.rio/documents/48c0210e96074b48b401ec2fa4ad99b3

Explore at:

Dataset updated

Jun 2, 2022

Dataset authored and provided by

Prefeitura da Cidade do Rio de Janeiro

License

Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically

Description

Taxa de precipitação estimada de áreas do sudeste brasileiro. As estimativas são feitas de hora em hora, cada registro contendo dados desta estimativa. Cada área é um quadrado formado por 4km de lado. Dados coletados pelo satélite GOES-16.

  Como acessar


  Nessa página


  Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.


  BigQuery




      SELECT


      *


      FROM


      `datario.meio_ambiente_clima.taxa_precipitacao_satelite`


      LIMIT


      1000




  Clique aqui
  para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
  acesse nossa documentação para entender como acessar os dados.


  Python



    import
    basedosdados
    as
    bd


    # Para carregar o dado direto no pandas

    df
    =
    bd.read_sql
    (
    "SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
    ,
    billing_project_id
    =
    "<id_do_seu_projeto_gcp>"
    )




  R



    install.packages(
    "basedosdados"
    )

    library(
    "basedosdados"
    )


    # Defina o seu projeto no Google Cloud

    set_billing_id(
    "<id_do_seu_projeto_gcp>"
    )


    # Para carregar o dado direto no R

    tb <- read_sql(
    "SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
    )






  Cobertura temporal


  Desde 2020 até a data corrente




  Frequência de atualização


  Diário




  Órgão gestor


  Centro de Operações da Prefeitura do Rio (COR)




  Colunas



    Nome
    Descrição




        latitude
        Latitude do centro da área.



        longitude
        Longitude do centro da área.



        rrqpe
        Taxa de precipitação estimada, medidas em milímetros por hora.



        primary_key
        Chave primária criada a partir da concatenação da coluna data, horário, latitude e longitude. Serve para evitar dados duplicados.



        horario
        Horário no qual foi realizada a medição



        data_particao
        Data na qual foi realizada a medição







  Dados do publicador


  Nome: Patrícia Catandi
  E-mail: patriciabcatandi@gmail.com

Meio Ambiente: Estações pluviométricas (AlertaRio)

datario-pcrj.hub.arcgis.com
data.rio

Updated Jun 2, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Prefeitura da Cidade do Rio de Janeiro (2022). Meio Ambiente: Estações pluviométricas (AlertaRio) [Dataset]. https://datario-pcrj.hub.arcgis.com/documents/cc4863712d65418abd8b2063a50bf453

Explore at:

Dataset updated

Jun 2, 2022

Dataset authored and provided by

Prefeitura da Cidade do Rio de Janeiro

License

Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically

Description

Dados sobre as estações pluviométricas do alertario ( Sistema Alerta Rio da Prefeitura do Rio de Janeiro ) na cidade do Rio de Janeiro.

  Como acessar


  Nessa página


  Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou,
  para mesmo resultado, pode clicar aqui.


  BigQuery




      SELECT


      *


      FROM


      `datario.meio_ambiente_clima.estacoes_alertario`


      LIMIT


      1000




  Clique aqui
  para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
  acesse nossa documentação para entender como acessar os dados.


  Python



    import
    basedosdados
    as
    bd


    # Para carregar o dado direto no pandas

    df
    =
    bd.read_sql
    (
    "SELECT * FROM `datario.meio_ambiente_clima.estacoes_alertario` LIMIT 1000"
    ,
    billing_project_id
    =
    "<id_do_seu_projeto_gcp>"
    )




  R



    install.packages(
    "basedosdados"
    )

    library(
    "basedosdados"
    )


    # Defina o seu projeto no Google Cloud

    set_billing_id(
    "<id_do_seu_projeto_gcp>"
    )


    # Para carregar o dado direto no R

    tb <- read_sql(
    "SELECT * FROM `datario.meio_ambiente_clima.estacoes_alertario` LIMIT 1000"
    )






  Cobertura temporal


  N/A




  Frequência de atualização


  Anual




  Órgão gestor


  COR




  Colunas



    Nome
    Descrição




      x
      X UTM (SAD69 Zona 23)



      longitude
      Longitude onde a estação se encontra.



      id_estacao
      ID da estação definido pelo AlertaRIO.



      estacao
      Nome da estação.



      latitude
      Latitude onde a estação se encontra.



      cota
      Altura em metros onde a estação se encontra.



      endereco
      Endereço completo da estação.



      situacao
      Indica se a estação está operante ou com falha.



      data_inicio_operacao
      Data em que a estação começou a operar.



      data_fim_operacao
      Data em que a estação parou de operar.



      data_atualizacao
      Última data em que os dados sobre a data de operação foram atualizados.



      y
      Y UTM (SAD69 Zona 23)







  Dados do publicador


  Nome: Patricia Catandi
  E-mail: patriciabcatandi@gmail.com

Meio Ambiente: Estações meteorológicas (INMET/BDMET)

data.rio
datario-pcrj.hub.arcgis.com
+1more

Updated Jun 3, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Prefeitura da Cidade do Rio de Janeiro (2022). Meio Ambiente: Estações meteorológicas (INMET/BDMET) [Dataset]. https://www.data.rio/documents/f14b1ed52be447379383acbb96353e1c

Explore at:

Dataset updated

Jun 3, 2022

Dataset authored and provided by

Prefeitura da Cidade do Rio de Janeiro

License

Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically

Description

Dados sobre as estações meteorológicas do inmet ( Instituto Nacional de Meteorologia ) na cidade do Rio de Janeiro.

  Como acessar


  Nessa página


  Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.


  BigQuery




      SELECT


      *


      FROM


      `datario.meio_ambiente_clima.estacoes_inmet`


      LIMIT


      1000




  Clique aqui
  para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
  acesse nossa documentação para entender como acessar os dados.


  Python



    import
    basedosdados
    as
    bd


    # Para carregar o dado direto no pandas

    df
    =
    bd.read_sql
    (
    "SELECT * FROM `datario.meio_ambiente_clima.estacoes_inmet` LIMIT 1000"
    ,
    billing_project_id
    =
    "<id_do_seu_projeto_gcp>"
    )




  R



    install.packages(
    "basedosdados"
    )

    library(
    "basedosdados"
    )


    # Defina o seu projeto no Google Cloud

    set_billing_id(
    "<id_do_seu_projeto_gcp>"
    )


    # Para carregar o dado direto no R

    tb <- read_sql(
    "SELECT * FROM `datario.meio_ambiente_clima.estacoes_inmet` LIMIT 1000"
    )






  Cobertura temporal


  N/A




  Frequência de atualização


  Nunca




  Órgão gestor


  INMET




  Colunas



    Nome
    Descrição




        id_municipio
        Código do município do IBGE de 7 dígitos.



        latitude
        Latitude onde a estação se encontra.



        data_inicio_operacao
        Data em que a estação começou a operar.



        data_fim_operacao
        Data em que a estação parou de operar.



        situacao
        Indica se a estação está operante ou com falha.



        tipo_estacao
        Indica se a estação é automática ou manual. Pode conter nulos.



        entidade_responsavel
        Entidade responsável pela estação.



        data_atualizacao
        Última data em que os dados sobre a data de operação foram atualizados.



        longitude
        Longitude onde a estação se encontra.



        sigla_uf
        Sigla do estado.



        id_estacao
        ID da estação definido pelo INMET.



        nome_estacao
        Nome da estação.







  Dados do publicador


  Nome: Patricia Catandi
  E-mail: patriciabcatandi@gmail.com

Administração de Serviços Públicos: Chamados feitos ao 1746

hub.arcgis.com
datario-pcrj.hub.arcgis.com

Updated Jun 2, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Prefeitura da Cidade do Rio de Janeiro (2022). Administração de Serviços Públicos: Chamados feitos ao 1746 [Dataset]. https://hub.arcgis.com/documents/52b6bd003abf4b8995ec9860e65a82c5

Explore at:

Dataset updated

Jun 2, 2022

Dataset authored and provided by

Prefeitura da Cidade do Rio de Janeiro

License

Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically

Description

Chamados feitos ao 1746. São chamados desde março de 2011, quando começou o projeto 1746.

  Como acessar


  Nessa página


  Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.


  BigQuery




      SELECT


      *


      FROM


      `datario.administracao_servicos_publicos.chamado_1746`


      LIMIT


      1000




  Clique aqui para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
  acesse nossa documentação para entender como acessar os dados.


  Python



    import
    basedosdados
    as
    bd


    # Para carregar o dado direto no pandas

    df
    =
    bd.read_sql
    (
    "SELECT * FROM `datario.administracao_servicos_publicos.chamado_1746` LIMIT 1000"
    ,
    billing_project_id
    =
    "<id_do_seu_projeto_gcp>"
    )




  R



    install.packages(
    "basedosdados"
    )

    library(
    "basedosdados"
    )


    # Defina o seu projeto no Google Cloud

    set_billing_id(
    "<id_do_seu_projeto_gcp>"
    )


    # Para carregar o dado direto no R

    tb <- read_sql(
    "SELECT * FROM `datario.administracao_servicos_publicos.chamado_1746` LIMIT 1000"
    )






  Cobertura temporal


  Março de 2011




  Frequência de atualização


  Diário




  Órgão gestor


  SEGOVI




  Colunas



    Nome
    Descrição




        id_chamado
        Identificador único do chamado no banco de dados.



        data_inicio
        Data de abertura do chamado. Ocorre quando o operador registra o chamado.



        data_fim
        Data de fechamento do chamado. O chamado é fechado quando o pedido é atendido ou quando se percebe que o pedido não pode ser atendido.



        id_bairro
        Identificador único, no banco de dados, do bairro onde ocorreu o fato que gerou o chamado.



        id_territorialidade
        Identificador único, no banco de dados, da territorialidade onde ocorreu o fato que gerou o chamado. Territorialidade é uma região da cidade do Rio de Janeiro que tem com responsável um órgão especifico. Exemplo: CDURP, que é responsável pela região do porto do Rio de Janeiro.



        id_logradouro
        Identificador único, no banco de dados, do logradouro onde ocorreu o fato que gerou o chamado.



        numero_logradouro
        Número da porta onde ocorreu o fato que gerou o chamado.



        id_unidade_organizacional
        Identificador único, no banco de dados, do órgão que executa o chamado. Por exemplo: identificador da COMLURB quando o chamado é relativo a limpeza urbana.



        nome_unidade_organizacional
        Nome do órgão que executa a demanda. Por exemplo: COMLURB quando a demanda é relativa a limpeza urbana.



        unidade_organizadional_ouvidoria
        Booleano indicando se o chamado do cidadão foi feita Ouvidoria ou não. 1 caso sim, 0 caso não,



        categoria
        Categoria do chamado. Exemplo: Serviço, informação, sugestão, elogio, reclamação, crítica.



        id_tipo
        Identificador único, no banco de dados, do tipo do chamado. Ex: Iluminação pública.



        tipo
        Nome do tipo do chamado. Ex: Iluminação pública.



        id_subtipo
        Identificador único, no banco de dados, do subtipo do chamado. Ex: Reparo de lâmpada apagada.



        subtipo
        Nome do subtipo do chamado. Ex: Reparo de lâmpada apagada.



        status
        Status do chamado. Ex. Fechado com solução, aberto em andamento, pendente etc.



        longitude
        Longitude do lugar do evento que motivou o chamado.



        latitude
        Latitude do lugar do evento que motivou o chamado.



        data_alvo_finalizacao
        Data prevista para o atendimento do chamado. Caso prazo_tipo seja D fica em branco até o diagnóstico ser feito.



        data_alvo_diagnostico
        Data prevista para fazer o diagnóstico do serviço. Caso prazo_tipo seja F esta data fica em branco.



        data_real_diagnostico
        Data em que foi feito o diagnóstico do serviço. Caso prazo_tipo seja F esta data fica em branco.



        tempo_prazo
        Prazo para o serviço ser feito. Em dias ou horas após a abertura do chamado. Caso haja diagnóstico o prazo conta após se fazer o diagnóstico.



        prazo_unidade
        Unidade de tempo utilizada no prazo. Dias ou horas. D ou H.



        prazo_tipo
        Diagnóstico ou finalização. D ou F. Indica se a chamada precisa de diagnóstico ou não. Alguns serviços precisam de avaliação para serem feitos, neste caso é feito o diagnóstico. Por exemplo, pode de árvore. Há a necessidade de um engenheiro ambiental verificar a necessidade da poda ou não.



        id_unidade_organizacional_mae
        ID da unidade organizacional mãe do orgão que executa a demanda. Por exemplo: "CVA - Coordenação de Vigilância de Alimentos" é quem executa a demanda e obede a unidade organizacional mãe "IVISA-RIO - Instituto Municipal de Vigilância Sanitária, de Zoonoses e de Inspeção Agropecuária". A coluna se refere ao ID deste último.



        situacao
        Identifica se o chamado foi encerrado



        tipo_situacao
        Indica o status atual do chamado entre as categorias Atendido, Atendido parcialmente, Não atendido, Não constatado e Andamento



        dentro_prazo
        Indica se a data alvo de finalização do chamado ainda está dentro do prazo estipulado.



        justificativa_status
        Justificativa que os órgãos usam ao definir o status. Exemplo: SEM POSSIBILIDADE DE ATENDIMENTO - justificativa: Fora de área de atuação do municipio



        reclamacoes
        Quantidade de reclamações.







  Dados do(a) publicador(a)


  Nome: Patricia Catandi
  E-mail: patriciabcatandi@gmail.com

Transporte Rodoviário: Histórico de GPS dos ônibus (SPPO)

data.rio
hub.arcgis.com
+1more

Updated Jun 8, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Prefeitura da Cidade do Rio de Janeiro (2022). Transporte Rodoviário: Histórico de GPS dos ônibus (SPPO) [Dataset]. https://www.data.rio/documents/6409ea499d474bfeb4063cfc31203403

Explore at:

Dataset updated

Jun 8, 2022

Dataset authored and provided by

Prefeitura da Cidade do Rio de Janeiro

License

Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically

Description

  Como acessar


  Nessa página


  Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.


  BigQuery




      SELECT


      *


      FROM


      `datario.transporte_rodoviario_municipal.gps_onibus`


      LIMIT


      1000




  Clique aqui
  para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
  acesse nossa documentação para entender como acessar os dados.


  Python



    import
    basedosdados
    as
    bd


    # Para carregar o dado direto no pandas

    df
    =
    bd.read_sql
    (
    "SELECT * FROM `datario.transporte_rodoviario_municipal.gps_onibus` LIMIT 1000"
    ,
    billing_project_id
    =
    "<id_do_seu_projeto_gcp>"
    )




  R



    install.packages(
    "basedosdados"
    )

    library(
    "basedosdados"
    )


    # Defina o seu projeto no Google Cloud

    set_billing_id(
    "<id_do_seu_projeto_gcp>"
    )


    # Para carregar o dado direto no R

    tb <- read_sql(
    "SELECT * FROM `datario.transporte_rodoviario_municipal.gps_onibus` LIMIT 1000"
    )






  Cobertura temporal


  01/03/2021 até o momento




  Frequência de atualização


  Horária



  Órgão gestor


  Secretaria Municipal de Transportes




  Colunas



    Nome
    Descrição




        modo
        SPPO – nesta tabela consta apenas este modo 



        timestamp_gps
        Timestamp de emissão do sinal de GPS



        data
        Data do timestamp de emissão do sinal de GPS



        hora
        Hora do timestamp de emissão do sinal de GPS



        id_veiculo
        Código identificador do veículo (número de ordem).



        servico
        Serviço realizado pelo veículo.



        latitude
        Parte da coordenada geográfica (eixo y) em graus decimais (EPSG:4326 - WGS84)



        longitude
        Parte da coordenada geográfica (eixo x) em graus decimais (EPSG:4326 - WGS84)



        flag_em_movimento
        Veículos com 'velocidade' abaixo da 'velocidade_limiar_parado', são considerados como parado (false). Caso contrário, são considerados andando (true)



        tipo_parada
        Identifica veículos parados em terminais ou garagens.



        flag_linha_existe_sigmob
        Flag de verificação se a linha informada existe no SIGMOB.



        velocidade_instantanea
         Velocidade instantânea do veículo, conforme informado pelo GPS (km/h)



        velocidade_estimada_10_min
        Velocidade média nos últimos 10 minutos de operação (km/h)



        distancia
        Distância da última posição do GPS em relação à posição atual (m)



        fonte_gps
        Fornecedor dos dados de GPS (zirix ou conecta)



        versao
        Código de controle de versão do dado (SHA Github)







  Dados do(a) publicador(a)


  Nome: Subsecretaria de Tecnologia em Transportes (SUBTT)
  E-mail: dados.smtr@prefeitura.rio

Dados do sistema Comando (COR): ocorrencias

data.rio
datario-pcrj.hub.arcgis.com
+1more

Updated Oct 5, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Prefeitura da Cidade do Rio de Janeiro (2022). Dados do sistema Comando (COR): ocorrencias [Dataset]. https://www.data.rio/documents/f9ddaeb4ac754975846716f084645f3d

Explore at:

Dataset updated

Oct 5, 2022

Dataset authored and provided by

Prefeitura da Cidade do Rio de Janeiro

License

Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically

Description

Ocorrências disparadas pelo COR desde 2015. Uma ocorrência na cidade do Rio de Janeiro é um acontecimento que exije um acompanhamento e, na maioria das vezes, uma ação da PCRJ. Por exemplo, Buraco na pista, bolsão d'água, enguiço mecânico. Uma ocorrência aberta é uma ocorrência que ainda não foi solucionada. Acesse também através da API do Escritório de Dados: https://api.dados.rio/v1/

  Como acessar


  Nessa página


  Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.


  BigQuery




      SELECT


      *


      FROM


      `datario.adm_cor_comando.ocorrencias`


      LIMIT


      1000




  Clique aqui
  para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
  acesse nossa documentação para entender como acessar os dados.


  Python



    import
    basedosdados
    as
    bd


    # Para carregar o dado direto no pandas

    df
    =
    bd.read_sql
    (
    "SELECT * FROM `datario.adm_cor_comando.ocorrencias` LIMIT 1000"
    ,
    billing_project_id
    =
    "<id_do_seu_projeto_gcp>"
    )




  R



    install.packages(
    "basedosdados"
    )

    library(
    "basedosdados"
    )


    # Defina o seu projeto no Google Cloud

    set_billing_id(
    "<id_do_seu_projeto_gcp>"
    )


    # Para carregar o dado direto no R

    tb <- read_sql(
    "SELECT * FROM `datario.adm_cor_comando.ocorrencias` LIMIT 1000"
    )






  Cobertura temporal


  Não informado.




  Frequência de atualização


  Diário




  Órgão gestor


  COR




  Colunas



    Nome
    Descrição




        data_inicio
        Data e hora do registro do evento na PCRJ.



        data_fim
        Data e hora do encerramento do evento na PCRJ. O evento é encerrado quando é solucionado. Este atributo está vazio quanto o evento está aberto.



        bairro
        Bairro onde ocorreu o evento.



        id_pop
        Identificador do POP.



        status
        Status do evento (ABERTO, FECHADO).



        gravidade
        Gravidade do evento (BAIXO, MEDIO, ALTO, CRITICO).



        prazo
        Prazo esperado de solução do evento (CURTO, MEDIO(acima de 3 dias), LONGO( acima de 5 dias)).



        latitude
        Latitude em formato WGS-84 em que ocorreu o evento



        longitude
        Longitude em formato WGS-84 em que ocorreu o evento



        id_evento
        Identificador do evento.



        descricao
        Descrição do evento.



        tipo
        Tipo do evento (PRIMARIO, SECUNDARIO)







  Dados do(a) publicador(a)


  Nome: Patrícia Catandi
  E-mail: patriciabcatandi@gmail.com

Facebook

Twitter

Click to copy link

Link copied

Cite

Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/bigquery/patents

Google Patents Public Data

Worldwide bibliographic and US patent publications (BigQuery)

Explore at:

173 scholarly articles cite this dataset (View in Google Scholar)

zip(0 bytes)Available download formats

Dataset updated

Sep 19, 2018

Dataset provided by

Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery

Authors

Google BigQuery

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

Content

The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

Acknowledgements

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

Banner photo by Helloquence on Unsplash

Clear search

Close search

Google apps

Main menu

Google Patents Public Data

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Content

Acknowledgements

Google Analytics Sample

Intellectual Property Investigations by the USITC

Context

Content

Acknowledgements

Data from: San Francisco Open Data

Context

Content

Acknowledgements

Inspiration

gnomAD

MLB 2016 Pitch-by-Pitch

Open Images

Context

Content

Querying BigQuery Tables

Acknowledgements

Inspiration

Company Data | Global Coverage | 65M+ Company profiles | Bi-weekly updates

MIMIC-IV

An Empirical Study of Proxy Smart Contracts at the Ethereum Ecosystem Scale

Austin Waste and Diversion

Context:

Content:

Acknowledgements:

Inspiration:

CFPB Consumer Complaint Database

NYC Citi Bike Trips

Transporte Rodoviário: Histórico de GPS do BRT

Meio Ambiente: Taxa de Precipitação (GOES-16)

Meio Ambiente: Estações pluviométricas (AlertaRio)

Meio Ambiente: Estações meteorológicas (INMET/BDMET)

Administração de Serviços Públicos: Chamados feitos ao 1746

Transporte Rodoviário: Histórico de GPS dos ônibus (SPPO)

Dados do sistema Comando (COR): ocorrencias

Google Patents Public Data

Worldwide bibliographic and US patent publications (BigQuery)

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Content

Acknowledgements