21 datasets found
  1. Google Patents Public Data

    • kaggle.com
    zip
    Updated Sep 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/bigquery/patents
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2018
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

    Content

    The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

    For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

    “Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

    Banner photo by Helloquence on Unsplash

  2. Google Analytics Sample

    • console.cloud.google.com
    Updated Jul 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&inv=1&invt=AbzttQ (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data
    Explore at:
    Dataset updated
    Jul 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  3. Intellectual Property Investigations by the USITC

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Intellectual Property Investigations by the USITC [Dataset]. https://www.kaggle.com/bigquery/usitc-investigations
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Section 337, Tariff Act of 1930, Investigations of Unfair Practices in Import Trade. Under section 337, the USITC determines whether there is unfair competition in the importation of products into, or their subsequent sale in, the United States. Section 337 prohibits the importation into the US , or the sale of such articles by owners, importers or consignees, of articles which infringe a patent, copyright, trademark, or semiconductor mask work, or where unfair competition or unfair acts exist that can destroy or substantially injure a US industry or prevent one from developing, or restrain or monopolize trade in US commerce. These latter categories are very broad: unfair competition can involve counterfeit, mismarked or misbranded goods, where the sale of the goods are at unfairly low prices, where other antitrust violations take place such as price fixing, market division or the goods violate a standard applicable to such goods.

    Content

    US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.

    Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:usitc_investigations

    "US International Trade Commission 337Info Unfair Import Investigations Information System" by the USITC, for public use.

    Banner photo by João Silas on Unsplash

  4. Data from: San Francisco Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasets/datasf/san-francisco
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    DataSF
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    San Francisco
    Description

    Context

    DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

    https://datasf.org/about/

    Content

    This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

    • This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.
    • This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.
    • This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).
    • This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    http://datasf.org/

    Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @meric from Unplash.

    Inspiration

    Which neighborhoods have the highest proportion of offensive graffiti?

    Which complaint is most likely to be made using Twitter and in which neighborhood?

    What are the most complained about Muni stops in San Francisco?

    What are the top 10 incident types that the San Francisco Fire Department responds to?

    How many medical incidents and structure fires are there in each neighborhood?

    What’s the average response time for each type of dispatched vehicle?

    Which category of police incidents have historically been the most common in San Francisco?

    What were the most common police incidents in the category of LARCENY/THEFT in 2016?

    Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

    What is the average tree diameter?

    What is the highest number of a particular species of tree planted in a single year?

    Which San Francisco locations feature the largest number of trees?

  5. gnomAD

    • console.cloud.google.com
    Updated Jun 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Broad%20Institute%20of%20MIT%20and%20Harvard&hl=fr&inv=1&invt=Ab0asQ (2021). gnomAD [Dataset]. https://console.cloud.google.com/marketplace/product/broad-institute/gnomad?hl=fr
    Explore at:
    Dataset updated
    Jun 1, 2021
    Dataset provided by
    Googlehttp://google.com/
    Description

    The Genome Aggregation Database (gnomAD) is maintained by an international coalition of investigators to aggregate and harmonize data from large-scale sequencing projects. These public datasets are available in VCF format in Google Cloud Storage and in Google BigQuery as integer range partitioned tables . Each dataset is sharded by chromosome meaning variants are distributed across 24 tables (indicated with “_chr*” suffix). Utilizing the sharded tables reduces query costs significantly. Variant Transforms was used to process these VCF files and import them to BigQuery. VEP annotations were parsed into separate columns for easier analysis using Variant Transforms’ annotation support . These public datasets are included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Find out more in our blog post, Providing open access to gnomAD on Google Cloud . Questions? Contact gcp-life-sciences-discuss@googlegroups.com.

  6. MLB 2016 Pitch-by-Pitch

    • console.cloud.google.com
    Updated Jul 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Sportradar&inv=1&invt=Ab1TUA (2020). MLB 2016 Pitch-by-Pitch [Dataset]. https://console.cloud.google.com/marketplace/product/sportradar-public-data/mlb-pitch-by-pitch
    Explore at:
    Dataset updated
    Jul 5, 2020
    Dataset provided by
    Sportradarhttp://sportradar.com/
    Googlehttp://google.com/
    Description

    This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. This dataset contains the following tables: games_wide (every pitch, steal, or lineup event for each at bat in the 2016 regular season), games_post_wide(every pitch, steal, or lineup event for each at-bat in the 2016 post season), and schedules ( the schedule for every team in the regular season). The schemas for the games_wide and games_post_wide tables are identical. With this data you can effectively replay a game and rebuild basic statistics for players and teams. Note: This data was built via a denormalization process over raw game log files which may contain scoring errors and in some cases missing data. For official scoring and statistical information please consult mlb.com , baseball-reference.com , or sportradar.com . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  7. Open Images

    • kaggle.com
    • opendatalab.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Open Images [Dataset]. https://www.kaggle.com/datasets/bigquery/open-images
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Labeled datasets are useful in machine learning research.

    Content

    This public dataset contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.

    Tables: 1) annotations_bbox 2) dict 3) images 4) labels

    Update Frequency: Quarterly

    Querying BigQuery Tables

    Fork this kernel to get started.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:open_images

    https://cloud.google.com/bigquery/public-data/openimages

    APA-style citation: Google Research (2016). The Open Images dataset [Image urls and labels]. Available from github: https://github.com/openimages/dataset.

    Use: The annotations are licensed by Google Inc. under CC BY 4.0 license.

    The images referenced in the dataset are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

    Banner Photo by Mattias Diesel from Unsplash.

    Inspiration

    Which labels are in the dataset? Which labels have "bus" in their display names? How many images of a trolleybus are in the dataset? What are some landing pages of images with a trolleybus? Which images with cherries are in the training set?

  8. d

    Company Data | Global Coverage | 65M+ Company profiles | Bi-weekly updates

    • datarade.ai
    .json, .csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forager.ai, Company Data | Global Coverage | 65M+ Company profiles | Bi-weekly updates [Dataset]. https://datarade.ai/data-products/b2b-company-data-worldwide-61m-records-verified-updated-forager-ai-351c
    Explore at:
    .json, .csvAvailable download formats
    Dataset provided by
    Forager.ai
    Area covered
    United States Minor Outlying Islands, Iran (Islamic Republic of), Guatemala, United States of America, Faroe Islands, Sint Maarten (Dutch part), Madagascar, Tunisia, Kyrgyzstan, Falkland Islands (Malvinas)
    Description

    Global B2B Company Database | 65M+ Verified Firms | Firmographics Forget stale corporate directories – Forager.ai delivers living, breathing company intelligence trusted by VCs, Fortune 500 teams, and SaaS leaders. Our 65 million+ AI-validated company profiles are refreshed every 14 days to track leadership changes, tech migrations, and growth signals competitors miss.

    Why This Outperforms Generic Firmographics ✅ AI That Works Like Your Best Analyst Cross-references 12+ sources to: ✔ Flag companies hiring sales teams → Ready to buy ✔ Detect tech stack changes → Migration opportunities ✔ Identify layoffs/expansions → Timely outreach windows

    ✅ Freshness That Matters We update 100% of records every 2-3 weeks – critical for tracking:

    Funding round and revenue.

    Company job posts

    ✅ Ethical & Audit-Ready Full GDPR/CCPA compliance with:

    Usage analytics dashboard

    Your Secret Weapon for: 🔸 Sales Teams: → Identify high-growth targets 83% faster (employee growth + tech stack filters) → Prioritize accounts with "hiring spree" or "new funding" tags

    🔸 Investors: → Track 18K+ private companies with revenue/employee alerts → Portfolio monitoring with 92% prediction accuracy on revenue shifts

    🔸 Marketers: → ABM campaigns powered by technographics (Slack → Teams migrators) → Event targeting using travel patterns (HQ → conference city matches)

    🔸 Data Teams: → Enrich Snowflake/Redshift warehouses via API → Build custom models with 150+ firmographic/technographic fields

    Core Data Points ✔ Financial Health: Revenue ranges, funding history, growth rate estimates ✔ Tech Stack: CRM, cloud platforms, marketing tools, Web technologies used. ✔ People Moves: C-suite, Employees headcount ✔ Expansion Signals: New offices, job postings.

    Enterprise-Grade Delivery

    API: Credits system to find company using any field in schema; returns name, domain, industry, headcount, location, LinkedIn etc.

    Cloud Sync: Auto-update Snowflake/Redshift/BigQuery

    CRM Push: Direct to Salesforce/HubSpot/Pipedrive

    Flat Files: CSV/JSON

    Why Clients Never Go Back to Legacy Providers → 6-Month ROI Guarantee – We’ll beat your current vendor or extend your plan → Free Data Audit – Upload your CRM list → We’ll show gaps/opportunities → Live Training – Our analysts teach you to mine hidden insights

    Keywords (Naturally Integrated): Global Company Data | Firmographic Database | B2B Technographic data | Private Company Intelligence | CRM Enrichment API | Sales Lead Database | VC Due Diligence Data | AI-Validated Firmographics | Market Expansion Signals | Competitor Benchmarking

  9. p

    MIMIC-IV

    • physionet.org
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
    Explore at:
    Dataset updated
    Oct 11, 2024
    Authors
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.

  10. An Empirical Study of Proxy Smart Contracts at the Ethereum Ecosystem Scale

    • zenodo.org
    pdf, zip
    Updated Dec 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wuqi Zhang; Wuqi Zhang (2024). An Empirical Study of Proxy Smart Contracts at the Ethereum Ecosystem Scale [Dataset]. http://doi.org/10.5281/zenodo.14566032
    Explore at:
    pdf, zipAvailable download formats
    Dataset updated
    Dec 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wuqi Zhang; Wuqi Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    # An Empirical Study of Proxy Smart Contracts at Ethereum Ecosystem Scale

    In this work, we conduct the first comprehensive study on Ethereum proxies. We organize our data and code into three sections as follows, aligning with the structure of our paper.

    * **1. Proxy Contract Preparation.** To collect a comprehensive dataset of proxies, we propose *ProxyEx*, the first framework designed to detect proxy directly from bytecode.
    * **2. Logic Contract Preparation.** To analyze the logic contracts of proxies, we extract transactions and traces for extracting logic contracts from all the related proxies.
    * **3. Three Research Questions.** In this paper, we conduct the first systematic study on proxies on Ethereum, aiming to answer the following research questions.
    * RQ1: Statistics. How many proxies are there on Ethereum? How often do proxies modify their logic? How many transactions are executed on proxies?
    * RQ2: Purpose. What are the major purposes of implementing proxy patterns for smart contracts?
    * RQ3: Bugs and Pitfalls. What types of bugs and pitfalls can exist in proxies?


    ## 1. Proxy Contract Preparation
    To facilitate proxy contract data collection, we design a system, *ProxyEx*, to detect proxy contracts from contract bytecode.

    #### Environment Setup
    First make sure you have the following things installed on your system:

    * Boost libraries (Can be installed on Debian with apt install libboost-all-dev)

    * Python 3.8

    * Souffle 2.3 or 2.4

    Now install the Souffle custom functors and then You should now be ready to run *ProxyEx*.

    * run *cd proxy-contract-prep/proxyex/souffle-addon$ && make*


    #### Step 1: unzip the contracts for getting bytecode
    We collect all the on-chain smart contract bytecode as of September 10, 2023. In total, we have *62,578,635* smart contracts.

    * download *contracts.zip* under *proxyex* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing) and place it under *proxy-contract-prep*
    * run *unzip contracts.zip* under *proxy-contract-prep* ---> generate the contract bytecode under *proxy-contract-prep/contracts*

    #### Step 2: run the proxy detection script in parallel
    To speed up the detection process, we optimize it by running multiple python scripts.

    * run *bash proxy.sh* under *proxy-contract-prep/scripts-run* ---> generate all the results under *proxy-contract-prep/version1*
    * run *bash kill.sh* under *proxy-contract-prep/scripts-run* ---> kill all the running scripts

    #### Step 3: analyze all the proxy detection results
    We apply *ProxyEx* on the smart contract bytecode with a timeout of *60* seconds; there are *2,031,422* proxy addresses in total (3.25\%). The average detection time of proxy and non-proxy contracts are *14.85* seconds and *3.88* seconds, respectively.

    * download *first_contract.json* under *proxyex* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing) and place it under *proxy-contract-prep/scripts-analyze*
    * run *python3 analyze_proxy.py* under *proxy-contract-prep/scripts-analyze* ---> generate all the results under *proxy-contract-prep/scripts-analyze/stats1* for later analysis.
    * we have already uploaded our analysis results into the *stats1.zip* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing); run *unzip stats1.zip* under under *proxy-contract-prep/scripts-analyze*, you will get some results such as *all_proxy.txt* lists all the 2,031,422 proxy addresses.

    #### Step 4: manually analyze 1k contracts for accuracy
    To evaluate its effectiveness and performance, we randomly sampled 1,000 contracts from our dataset. Our examination revealed *548* proxy addresses and *452* non-proxy addresses.
    *ProxyEx* misclassified one proxy as non-proxy (false negative), indicating that our framework achieves *100\%* precision and over *99\%* recall.

    * *proxy-contract-prep/1k.csv* displays our manually checked results of 1,000 randomly sampled contracts

    ## 2. Logic Contract Preparation
    To extract logic contract addresses, we gather all the transaction traces associated with a *DELEGATECALL* sent from the proxy contracts. We collect a 3-tuple *{FromAddr, ToAddr, CallType}* for every trace from Google Bigquery APIs, which we subsequently aggregate into transactions. In total, we collect 172,709,392 transactions for all the 2,031,422 proxy contracts.

    #### Step 1: extract transaction traces
    We run the SQL to download all the traces related to all our proxy contracts.

    * run *SELECT * FROM `bigquery-public-data.crypto_ethereum.traces` WHERE from_address IN ( SELECT trace FROM `moonlit-ceiling-399321.gmugcp.traces` ) or to_address IN ( SELECT trace FROM `moonlit-ceiling-399321.gmugcp.traces` ) ORDER BY transaction_hash*; in particular, *moonlit-ceiling-399321.gmugcp.traces* is the table consisting all the proxy contract addresses from *proxy-contract-prep/scripts-analyze/stats1/all_proxy.txt*.
    * the total transaction traces cost around 1.3 TB storage, and we cannot upload all of them here. We choose a segment of the data and store it in "logic-contract-prep/data/sample.json"
    * you can fetch all the data using the url *https://storage.googleapis.com/tracesdata/xxx.json*, where *xxx* starts from *000000000000* to *000000004429*.


    #### Step 2: extract logic contracts
    We aggregate the transaction traces into transactions and obtain the related logic contracts for every proxy contract, sorted by the timestamp (block number).

    * run "analyze.py" under *logic-contract-prep/scripts-analyze* ---> generate all the results under *logic-contract-prep/scripts-analyze/impl.json*
    * however, the impl.json costs 30GB, which is too large to be put here; therefore, we generate a sample *logic-contract-prep/scripts-analyze/sample_impl.json*
    * also, you can fetch the whole *impl.json* from [Google Drive](https://drive.google.com/drive/folders/1qcNFNrKk0OFRCciInWwNM1YFz6KeGyE_?usp=sharing)


    ## 3. Three Research Questions
    #### RQ1 - Statistics
    We do some statistics analysis of proxy contracts including bytecode duplication, transaction count and lifespan.
    * Bytecode Duplication: run "iv_rq1_figure3.py" under *three-research-questions/rq1/script/* relies on "iv_rq1_figure3.csv" data file under *three-research-questions/rq1/data/* ---> generates figure 3 in the paper.
    * Transaction Count: run "iv_rq1_figure4.py" under *three-research-questions/rq1/script/* relies on "iv_rq1_figure4.txt" data file under *three-research-questions/rq1/data/* ---> generates figure 4 in the paper.
    * Lifespan: run "iv_rq1_figure5.py" under *three-research-questions/rq1/script/* relies on "iv_rq1_figure5.txt" data file under *three-research-questions/rq1/data/* ---> generates figure 5 in the paper.

    #### RQ2 - Purposes
    We conduct manual analysis to understand purpose of proxy contracts and categorized into four following types.
    * Upgradeability: run "v_rq2_figure6.py" under *three-research-questions/rq2/script/* relies on "v_rq2_figure6.txt" data file under *three-research-questions/rq2/data/* ---> generates figure 6 in the paper.
    * Extensibility: The 32 contracts that identified by the detection algorithm of extensibility proxies are listed in "extensibility_proxies.txt", among which one proxy, `0x4deca517d6817b6510798b7328f2314d3003abac`, is the vulnerability proxy with proxy-logic collision bug (labelled by "Audius Hack").
    * Code-sharing: The file "code_sharing.txt" contains the 1,137,317 code-sharing proxies and 3,309 code-sharing proxy clusters that we identified.
    * Code-hiding: The file "code_hiding.txt" contains the 1,213 code-hiding proxies that we identified. The first column in the csv file is the proxy address while the second column contains a list of tuple: `claimed logic address in EIP1967 slot`, `actual logic address in execution`, `the block where such discrepancy is observed`.

    #### RQ3 - Bugs and Pitfalls

    In RQ3 we conduct a semi-automated detection of bugs and pitfalls in proxies.
    We leverage a set of automated helpers (as described in the paper) to help us prune non-vulnerable contracts before manual inspection.
    The automated helpers can be found in `pitfall-detection-helpers` folder.
    Note that the final results are obtained faithfully using manual inspection. The helper scripts are only used to data processing to reduce human efforts.

    * Proxy-logic collision:
    - the file "proxy_logic_collision.txt" shows the 32 proxies that we identified as well as our manual inspection results.
    - the file "proxy_logic_collision_detector_evaluation_sampled_txs.txt" lists the 100 transactions sampled to evaluate the reliability of our automated helper which identifies storage slot read/write operations.
    * Logic-logic collision:
    - the file "logic_logic_collision.txt" contains the 15 proxies that we identified to have logic-logic collisions.
    - the file "logic_logic_collision_detector_evaluation_sampled_contract_pairs.csv" lists the 100 new-version/old-version logic contract pairs sampled to evaluate the reliability of our automated helper to identify storage collisions between two logic contracts.
    * Uninitialized contract:
    - the file "uninitialized.csv" contains 183 proxies that was not initialized in the same transaction of deployment and may be at risk of front-running attack. Whether they are still exploitable (i.e., re-initialize by malicious actors at present) is also labelled in the csv.
    - the file "identified_initialize_function_calldata.csv" lists the 100 logic contracts sampled to evaluate the quality of `initialize` calldata extracted by our automated helper.

  11. Austin Waste and Diversion

    • kaggle.com
    Updated Aug 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Boysen (2017). Austin Waste and Diversion [Dataset]. https://www.kaggle.com/datasets/jboysen/austin-waste/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jacob Boysen
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Austin
    Description

    Context:

    This dataset is trash. Who in Austin makes it, who takes it, and where does it go?

    Content:

    Data ranges 2008-2016 and includes dropoff site, load id, time of load, type of load, weight of load, date, route number, and route type (recycling, street cleaning, garbage etc).

    Acknowledgements:

    This dataset was created by Austin city government and hosted on Google Cloud Platform. You can use Kernels to analyze, share, and discuss this data on Kaggle, but if you’re looking for real-time updates and bigger data, check out the data on BigQuery, too

    Inspiration:

    • How much trash is Austin generating?
    • Which are the trashiest routes? Who recycles the best?
    • Any seasonal changes?
    • Try to predict trash route usage from historical trash data
  12. CFPB Consumer Complaint Database

    • console.cloud.google.com
    Updated Jan 25, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Consumer%20Financial%20Protection%20Bureau&inv=1&invt=Ab1YPQ (2020). CFPB Consumer Complaint Database [Dataset]. https://console.cloud.google.com/marketplace/product/cfpb/complaint-database
    Explore at:
    Dataset updated
    Jan 25, 2020
    Dataset provided by
    Googlehttp://google.com/
    Description

    The Consumer Complaint Database is a collection of complaints about consumer financial products and services that we sent to companies for response. Complaints are published after the company responds, confirming a commercial relationship with the consumer, or after 15 days, whichever comes first. Complaints referred to other regulators, such as complaints about depository institutions with less than $10 billion in assets, are not published in the Consumer Complaint Database.This database is not a statistical sample of consumers’ experiences in the marketplace. Complaints are not necessarily representative of all consumers’ experiences and complaints do not constitute “information” for purposes of the Information Quality Act . Complaint volume should be considered in the context of company size and/or market share. For example, companies with more customers may have more complaints than companies with fewer customers. We encourage you to pair complaint data with public and private datasets for additional context. The Bureau publishes the consumer’s narrative description of his or her experience if the consumer opts to share it publicly and after the Bureau removes personal information. We don’t verify all the allegations in complaint narratives. Unproven allegations in consumer narratives should be regarded as opinion, not fact. We do not adopt the views expressed and make no representation that consumers’ allegations are accurate, clear, complete, or unbiased in substance or presentation. Users should consider what conclusions may be fairly drawn from complaints alone.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  13. NYC Citi Bike Trips

    • console.cloud.google.com
    Updated Jul 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:City%20of%20New%20York&inv=1&invt=Ab1CqA (2022). NYC Citi Bike Trips [Dataset]. https://console.cloud.google.com/marketplace/product/city-of-new-york/nyc-citi-bike
    Explore at:
    Dataset updated
    Jul 1, 2022
    Dataset provided by
    Googlehttp://google.com/
    Area covered
    New York
    Description

    Citi Bike is the nation's largest bike share program, with 10,000 bikes and 600 stations across Manhattan, Brooklyn, Queens, and Jersey City. This dataset includes Citi Bike trips since Citi Bike launched in September 2013 and is updated daily. The data has been processed by Citi Bike to remove trips that are taken by staff to service and inspect the system, as well as any trips below 60 seconds in length, which are considered false starts. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  14. d

    Transporte Rodoviário: Histórico de GPS do BRT

    • data.rio
    • hub.arcgis.com
    Updated Jun 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Transporte Rodoviário: Histórico de GPS do BRT [Dataset]. https://www.data.rio/documents/a17608e589864376bfad313e026c4681
    Explore at:
    Dataset updated
    Jun 8, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Dados históricos de posição geográfica de veículos do BRT.

    Dados completos disponíveis para consulta e download no data lake do data.rio. Os dados são capturados a cada minuto e tratados a cada hora. Dados sujeitos a alteração, como correções de buracos de captura e/ou ajustes de tratamento.

    Como acessar Nessa página

    Aqui, você encontrará um botão para realizar o download dos dados em formato
    CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar
    aqui.
    

    BigQuery

     SELECT 
     * 
     FROM 
    
     `datario.transporte_rodoviario_municipal.gps_brt`
    
     LIMIT 
     1000 
    
    
    Clique
    aqui
    para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência
    com BigQuery, acesse
    nossa documentação
    para entender como acessar os dados.
    

    Python

    import
    basedosdados
    as
    bd
    
    
    # Para carregar o dado direto no pandas
    
    df
    =
    bd.read_sql
    (
    "SELECT * FROM `datario.transporte_rodoviario_municipal.gps_brt` LIMIT
     1000"
    ,
    billing_project_id
    =
    "<id_do_seu_projeto_gcp>"
    )
    

    R

    install.packages(
    "basedosdados"
    )
    
    library(
    "basedosdados"
    )
    
    
    # Defina o seu projeto no Google Cloud
    
    set_billing_id(
    "<id_do_seu_projeto_gcp>"
    )
    
    
    # Para carregar o dado direto no R
    
    tb <- read_sql(
    "SELECT * FROM `datario.transporte_rodoviario_municipal.gps_brt` LIMIT
     1000"
    )
    

    Cobertura temporal 24/11/2021 até o momento

    Frequência de atualização Horária

    Órgão gestor Secretaria Municipal de Transportes (SMTR)

    Colunas

       Nome
    
    
       Descrição
    
    
    
    
       modo
    
    
       BRT – nesta tabela consta apenas este modo
    
    
    
    
       timestamp_gps
    
    
       Timestamp de emissão do sinal de GPS
    
    
    
    
       data
    
    
       Data do timestamp de emissão do sinal de GPS
    
    
    
    
       hora
    
    
       Hora do timestamp de emissão do sinal de GPS
    
    
    
    
       id_veiculo
    
    
       Código identificador do veículo (número de ordem).
    
    
    
    
       servico
    
    
       Serviço realizado pelo veículo.
    
    
    
    
       latitude
    
    
       Parte da coordenada geográfica (eixo y) em graus decimais (EPSG:4326 -
       WGS84)
    
    
    
    
       longitude
    
    
       Parte da coordenada geográfica (eixo x) em graus decimais (EPSG:4326 -
       WGS84)
    
    
    
    
       flag_em_movimento
    
    
       Veículos com 'velocidade' abaixo da 'velocidade_limiar_parado', são
       considerados como parado (false). Caso contrário, são considerados
       andando (true)
    
    
    
    
       tipo_parada
    
    
       Identifica veículos parados em terminais ou garagens.
    
    
    
    
       flag_linha_existe_sigmob
    
    
       Flag de verificação se a linha informada existe no SIGMOB.
    
    
    
    
       velocidade_instantanea
    
    
       Velocidade instantânea do veículo, conforme informado pelo GPS
       (km/h)
    
    
    
    
       velocidade_estimada_10_min
    
    
       Velocidade média nos últimos 10 minutos de operação (km/h)
    
    
    
    
       distancia
    
    
       Distância da última posição do GPS em relação à posição atual (m)
    
    
    
    
       versao
    
    
       Código de controle de versão do dado (SHA Github)
    

    Dados do(a) publicador(a)

    Nome:
    Subsecretaria de Tecnologia em Transportes (SUBTT)
    E-mail:
    dados.smtr@prefeitura.rio
    
  15. d

    Meio Ambiente: Taxa de Precipitação (GOES-16)

    • data.rio
    • hub.arcgis.com
    Updated Jun 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Meio Ambiente: Taxa de Precipitação (GOES-16) [Dataset]. https://www.data.rio/documents/48c0210e96074b48b401ec2fa4ad99b3
    Explore at:
    Dataset updated
    Jun 2, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Taxa de precipitação estimada de áreas do sudeste brasileiro. As estimativas são feitas de hora em hora, cada registro contendo dados desta estimativa. Cada área é um quadrado formado por 4km de lado. Dados coletados pelo satélite GOES-16.

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.meio_ambiente_clima.taxa_precipitacao_satelite`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      Desde 2020 até a data corrente
    
    
    
    
      Frequência de atualização
    
    
      Diário
    
    
    
    
      Órgão gestor
    
    
      Centro de Operações da Prefeitura do Rio (COR)
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
            latitude
            Latitude do centro da área.
    
    
    
            longitude
            Longitude do centro da área.
    
    
    
            rrqpe
            Taxa de precipitação estimada, medidas em milímetros por hora.
    
    
    
            primary_key
            Chave primária criada a partir da concatenação da coluna data, horário, latitude e longitude. Serve para evitar dados duplicados.
    
    
    
            horario
            Horário no qual foi realizada a medição
    
    
    
            data_particao
            Data na qual foi realizada a medição
    
    
    
    
    
    
    
      Dados do publicador
    
    
      Nome: Patrícia Catandi
      E-mail: patriciabcatandi@gmail.com
    
  16. a

    Meio Ambiente: Estações pluviométricas (AlertaRio)

    • datario-pcrj.hub.arcgis.com
    • data.rio
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Meio Ambiente: Estações pluviométricas (AlertaRio) [Dataset]. https://datario-pcrj.hub.arcgis.com/documents/cc4863712d65418abd8b2063a50bf453
    Explore at:
    Dataset updated
    Jun 2, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Dados sobre as estações pluviométricas do alertario ( Sistema Alerta Rio da Prefeitura do Rio de Janeiro ) na cidade do Rio de Janeiro.

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou,
      para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.meio_ambiente_clima.estacoes_alertario`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.meio_ambiente_clima.estacoes_alertario` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.meio_ambiente_clima.estacoes_alertario` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      N/A
    
    
    
    
      Frequência de atualização
    
    
      Anual
    
    
    
    
      Órgão gestor
    
    
      COR
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
          x
          X UTM (SAD69 Zona 23)
    
    
    
          longitude
          Longitude onde a estação se encontra.
    
    
    
          id_estacao
          ID da estação definido pelo AlertaRIO.
    
    
    
          estacao
          Nome da estação.
    
    
    
          latitude
          Latitude onde a estação se encontra.
    
    
    
          cota
          Altura em metros onde a estação se encontra.
    
    
    
          endereco
          Endereço completo da estação.
    
    
    
          situacao
          Indica se a estação está operante ou com falha.
    
    
    
          data_inicio_operacao
          Data em que a estação começou a operar.
    
    
    
          data_fim_operacao
          Data em que a estação parou de operar.
    
    
    
          data_atualizacao
          Última data em que os dados sobre a data de operação foram atualizados.
    
    
    
          y
          Y UTM (SAD69 Zona 23)
    
    
    
    
    
    
    
      Dados do publicador
    
    
      Nome: Patricia Catandi
      E-mail: patriciabcatandi@gmail.com
    
  17. d

    Meio Ambiente: Estações meteorológicas (INMET/BDMET)

    • data.rio
    • datario-pcrj.hub.arcgis.com
    • +1more
    Updated Jun 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Meio Ambiente: Estações meteorológicas (INMET/BDMET) [Dataset]. https://www.data.rio/documents/f14b1ed52be447379383acbb96353e1c
    Explore at:
    Dataset updated
    Jun 3, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Dados sobre as estações meteorológicas do inmet ( Instituto Nacional de Meteorologia ) na cidade do Rio de Janeiro.

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.meio_ambiente_clima.estacoes_inmet`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.meio_ambiente_clima.estacoes_inmet` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.meio_ambiente_clima.estacoes_inmet` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      N/A
    
    
    
    
      Frequência de atualização
    
    
      Nunca
    
    
    
    
      Órgão gestor
    
    
      INMET
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
            id_municipio
            Código do município do IBGE de 7 dígitos.
    
    
    
            latitude
            Latitude onde a estação se encontra.
    
    
    
            data_inicio_operacao
            Data em que a estação começou a operar.
    
    
    
            data_fim_operacao
            Data em que a estação parou de operar.
    
    
    
            situacao
            Indica se a estação está operante ou com falha.
    
    
    
            tipo_estacao
            Indica se a estação é automática ou manual. Pode conter nulos.
    
    
    
            entidade_responsavel
            Entidade responsável pela estação.
    
    
    
            data_atualizacao
            Última data em que os dados sobre a data de operação foram atualizados.
    
    
    
            longitude
            Longitude onde a estação se encontra.
    
    
    
            sigla_uf
            Sigla do estado.
    
    
    
            id_estacao
            ID da estação definido pelo INMET.
    
    
    
            nome_estacao
            Nome da estação.
    
    
    
    
    
    
    
      Dados do publicador
    
    
      Nome: Patricia Catandi
      E-mail: patriciabcatandi@gmail.com
    
  18. a

    Administração de Serviços Públicos: Chamados feitos ao 1746

    • hub.arcgis.com
    • datario-pcrj.hub.arcgis.com
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Administração de Serviços Públicos: Chamados feitos ao 1746 [Dataset]. https://hub.arcgis.com/documents/52b6bd003abf4b8995ec9860e65a82c5
    Explore at:
    Dataset updated
    Jun 2, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Chamados feitos ao 1746. São chamados desde março de 2011, quando começou o projeto 1746.

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.administracao_servicos_publicos.chamado_1746`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.administracao_servicos_publicos.chamado_1746` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.administracao_servicos_publicos.chamado_1746` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      Março de 2011
    
    
    
    
      Frequência de atualização
    
    
      Diário
    
    
    
    
      Órgão gestor
    
    
      SEGOVI
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
            id_chamado
            Identificador único do chamado no banco de dados.
    
    
    
            data_inicio
            Data de abertura do chamado. Ocorre quando o operador registra o chamado.
    
    
    
            data_fim
            Data de fechamento do chamado. O chamado é fechado quando o pedido é atendido ou quando se percebe que o pedido não pode ser atendido.
    
    
    
            id_bairro
            Identificador único, no banco de dados, do bairro onde ocorreu o fato que gerou o chamado.
    
    
    
            id_territorialidade
            Identificador único, no banco de dados, da territorialidade onde ocorreu o fato que gerou o chamado. Territorialidade é uma região da cidade do Rio de Janeiro que tem com responsável um órgão especifico. Exemplo: CDURP, que é responsável pela região do porto do Rio de Janeiro.
    
    
    
            id_logradouro
            Identificador único, no banco de dados, do logradouro onde ocorreu o fato que gerou o chamado.
    
    
    
            numero_logradouro
            Número da porta onde ocorreu o fato que gerou o chamado.
    
    
    
            id_unidade_organizacional
            Identificador único, no banco de dados, do órgão que executa o chamado. Por exemplo: identificador da COMLURB quando o chamado é relativo a limpeza urbana.
    
    
    
            nome_unidade_organizacional
            Nome do órgão que executa a demanda. Por exemplo: COMLURB quando a demanda é relativa a limpeza urbana.
    
    
    
            unidade_organizadional_ouvidoria
            Booleano indicando se o chamado do cidadão foi feita Ouvidoria ou não. 1 caso sim, 0 caso não,
    
    
    
            categoria
            Categoria do chamado. Exemplo: Serviço, informação, sugestão, elogio, reclamação, crítica.
    
    
    
            id_tipo
            Identificador único, no banco de dados, do tipo do chamado. Ex: Iluminação pública.
    
    
    
            tipo
            Nome do tipo do chamado. Ex: Iluminação pública.
    
    
    
            id_subtipo
            Identificador único, no banco de dados, do subtipo do chamado. Ex: Reparo de lâmpada apagada.
    
    
    
            subtipo
            Nome do subtipo do chamado. Ex: Reparo de lâmpada apagada.
    
    
    
            status
            Status do chamado. Ex. Fechado com solução, aberto em andamento, pendente etc.
    
    
    
            longitude
            Longitude do lugar do evento que motivou o chamado.
    
    
    
            latitude
            Latitude do lugar do evento que motivou o chamado.
    
    
    
            data_alvo_finalizacao
            Data prevista para o atendimento do chamado. Caso prazo_tipo seja D fica em branco até o diagnóstico ser feito.
    
    
    
            data_alvo_diagnostico
            Data prevista para fazer o diagnóstico do serviço. Caso prazo_tipo seja F esta data fica em branco.
    
    
    
            data_real_diagnostico
            Data em que foi feito o diagnóstico do serviço. Caso prazo_tipo seja F esta data fica em branco.
    
    
    
            tempo_prazo
            Prazo para o serviço ser feito. Em dias ou horas após a abertura do chamado. Caso haja diagnóstico o prazo conta após se fazer o diagnóstico.
    
    
    
            prazo_unidade
            Unidade de tempo utilizada no prazo. Dias ou horas. D ou H.
    
    
    
            prazo_tipo
            Diagnóstico ou finalização. D ou F. Indica se a chamada precisa de diagnóstico ou não. Alguns serviços precisam de avaliação para serem feitos, neste caso é feito o diagnóstico. Por exemplo, pode de árvore. Há a necessidade de um engenheiro ambiental verificar a necessidade da poda ou não.
    
    
    
            id_unidade_organizacional_mae
            ID da unidade organizacional mãe do orgão que executa a demanda. Por exemplo: "CVA - Coordenação de Vigilância de Alimentos" é quem executa a demanda e obede a unidade organizacional mãe "IVISA-RIO - Instituto Municipal de Vigilância Sanitária, de Zoonoses e de Inspeção Agropecuária". A coluna se refere ao ID deste último.
    
    
    
            situacao
            Identifica se o chamado foi encerrado
    
    
    
            tipo_situacao
            Indica o status atual do chamado entre as categorias Atendido, Atendido parcialmente, Não atendido, Não constatado e Andamento
    
    
    
            dentro_prazo
            Indica se a data alvo de finalização do chamado ainda está dentro do prazo estipulado.
    
    
    
            justificativa_status
            Justificativa que os órgãos usam ao definir o status. Exemplo: SEM POSSIBILIDADE DE ATENDIMENTO - justificativa: Fora de área de atuação do municipio
    
    
    
            reclamacoes
            Quantidade de reclamações.
    
    
    
    
    
    
    
      Dados do(a) publicador(a)
    
    
      Nome: Patricia Catandi
      E-mail: patriciabcatandi@gmail.com
    
  19. d

    Transporte Rodoviário: Histórico de GPS dos ônibus (SPPO)

    • data.rio
    • hub.arcgis.com
    • +1more
    Updated Jun 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Transporte Rodoviário: Histórico de GPS dos ônibus (SPPO) [Dataset]. https://www.data.rio/documents/6409ea499d474bfeb4063cfc31203403
    Explore at:
    Dataset updated
    Jun 8, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Dados completos disponíveis para consulta e download no data lake do data.rio. Os dados são capturados a cada minuto e tratados a cada hora. Dados sujeitos a alteração, como correções de buracos de captura e/ou ajustes de tratamento.

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.transporte_rodoviario_municipal.gps_onibus`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.transporte_rodoviario_municipal.gps_onibus` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.transporte_rodoviario_municipal.gps_onibus` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      01/03/2021 até o momento
    
    
    
    
      Frequência de atualização
    
    
      Horária
    
    
    
      Órgão gestor
    
    
      Secretaria Municipal de Transportes
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
            modo
            SPPO – nesta tabela consta apenas este modo 
    
    
    
            timestamp_gps
            Timestamp de emissão do sinal de GPS
    
    
    
            data
            Data do timestamp de emissão do sinal de GPS
    
    
    
            hora
            Hora do timestamp de emissão do sinal de GPS
    
    
    
            id_veiculo
            Código identificador do veículo (número de ordem).
    
    
    
            servico
            Serviço realizado pelo veículo.
    
    
    
            latitude
            Parte da coordenada geográfica (eixo y) em graus decimais (EPSG:4326 - WGS84)
    
    
    
            longitude
            Parte da coordenada geográfica (eixo x) em graus decimais (EPSG:4326 - WGS84)
    
    
    
            flag_em_movimento
            Veículos com 'velocidade' abaixo da 'velocidade_limiar_parado', são considerados como parado (false). Caso contrário, são considerados andando (true)
    
    
    
            tipo_parada
            Identifica veículos parados em terminais ou garagens.
    
    
    
            flag_linha_existe_sigmob
            Flag de verificação se a linha informada existe no SIGMOB.
    
    
    
            velocidade_instantanea
             Velocidade instantânea do veículo, conforme informado pelo GPS (km/h)
    
    
    
            velocidade_estimada_10_min
            Velocidade média nos últimos 10 minutos de operação (km/h)
    
    
    
            distancia
            Distância da última posição do GPS em relação à posição atual (m)
    
    
    
            fonte_gps
            Fornecedor dos dados de GPS (zirix ou conecta)
    
    
    
            versao
            Código de controle de versão do dado (SHA Github)
    
    
    
    
    
    
    
      Dados do(a) publicador(a)
    
    
      Nome: Subsecretaria de Tecnologia em Transportes (SUBTT)
      E-mail: dados.smtr@prefeitura.rio
    
  20. d

    Dados do sistema Comando (COR): ocorrencias

    • data.rio
    • datario-pcrj.hub.arcgis.com
    • +1more
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Dados do sistema Comando (COR): ocorrencias [Dataset]. https://www.data.rio/documents/f9ddaeb4ac754975846716f084645f3d
    Explore at:
    Dataset updated
    Oct 5, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Ocorrências disparadas pelo COR desde 2015. Uma ocorrência na cidade do Rio de Janeiro é um acontecimento que exije um acompanhamento e, na maioria das vezes, uma ação da PCRJ. Por exemplo, Buraco na pista, bolsão d'água, enguiço mecânico. Uma ocorrência aberta é uma ocorrência que ainda não foi solucionada. Acesse também através da API do Escritório de Dados: https://api.dados.rio/v1/

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.adm_cor_comando.ocorrencias`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.adm_cor_comando.ocorrencias` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.adm_cor_comando.ocorrencias` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      Não informado.
    
    
    
    
      Frequência de atualização
    
    
      Diário
    
    
    
    
      Órgão gestor
    
    
      COR
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
            data_inicio
            Data e hora do registro do evento na PCRJ.
    
    
    
            data_fim
            Data e hora do encerramento do evento na PCRJ. O evento é encerrado quando é solucionado. Este atributo está vazio quanto o evento está aberto.
    
    
    
            bairro
            Bairro onde ocorreu o evento.
    
    
    
            id_pop
            Identificador do POP.
    
    
    
            status
            Status do evento (ABERTO, FECHADO).
    
    
    
            gravidade
            Gravidade do evento (BAIXO, MEDIO, ALTO, CRITICO).
    
    
    
            prazo
            Prazo esperado de solução do evento (CURTO, MEDIO(acima de 3 dias), LONGO( acima de 5 dias)).
    
    
    
            latitude
            Latitude em formato WGS-84 em que ocorreu o evento
    
    
    
            longitude
            Longitude em formato WGS-84 em que ocorreu o evento
    
    
    
            id_evento
            Identificador do evento.
    
    
    
            descricao
            Descrição do evento.
    
    
    
            tipo
            Tipo do evento (PRIMARIO, SECUNDARIO)
    
    
    
    
    
    
    
      Dados do(a) publicador(a)
    
    
      Nome: Patrícia Catandi
      E-mail: patriciabcatandi@gmail.com
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/bigquery/patents
Organization logoOrganization logo

Google Patents Public Data

Worldwide bibliographic and US patent publications (BigQuery)

Explore at:
173 scholarly articles cite this dataset (View in Google Scholar)
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

Content

The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

Acknowledgements

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

Banner photo by Helloquence on Unsplash

Search
Clear search
Close search
Google apps
Main menu