24 datasets found
  1. Google Patents Public Data

    • kaggle.com
    zip
    Updated Sep 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/datasets/bigquery/patents
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2018
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

    Content

    The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

    For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

    “Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

    Banner photo by Helloquence on Unsplash

  2. BigQuery GIS Utility Datasets (U.S.)

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). BigQuery GIS Utility Datasets (U.S.) [Dataset]. https://www.kaggle.com/bigquery/utility-us
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Querying BigQuery tables You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME].

    • Project: "bigquery-public-data"
    • Table: "utility_us"

    Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.

    If you're using Python, you can start with this code:

    import pandas as pd
    from bq_helper import BigQueryHelper
    bq_assistant = BigQueryHelper("bigquery-public-data", "utility_us")
    
  3. h

    github_meta

    • huggingface.co
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepGit (2024). github_meta [Dataset]. https://huggingface.co/datasets/deepgit/github_meta
    Explore at:
    Dataset updated
    Aug 9, 2024
    Dataset authored and provided by
    DeepGit
    License

    https://choosealicense.com/licenses/osl-3.0/https://choosealicense.com/licenses/osl-3.0/

    Description

    Process to Generate DuckDB Dataset

      1. Load Repository Metadata
    

    Read repo_metadata.json from GitHub Public Repository Metadata Normalize JSON into three lists: Repositories → general metadata (stars, forks, license, etc.). Languages → repo-language mappings with size. Topics → repo-topic mappings.

    Convert lists into Pandas DataFrames: df_repos, df_languages, df_topics.

      2. Enhance with BigQuery Data
    

    Create a temporary BigQuery table (repo_list)… See the full description on the dataset page: https://huggingface.co/datasets/deepgit/github_meta.

  4. gnomAD

    • console.cloud.google.com
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Broad%20Institute%20of%20MIT%20and%20Harvard&hl=zh_TW (2023). gnomAD [Dataset]. https://console.cloud.google.com/marketplace/product/broad-institute/gnomad?hl=zh_TW
    Explore at:
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    The Genome Aggregation Database (gnomAD) is maintained by an international coalition of investigators to aggregate and harmonize data from large-scale sequencing projects. These public datasets are available in VCF format in Google Cloud Storage and in Google BigQuery as integer range partitioned tables . Each dataset is sharded by chromosome meaning variants are distributed across 24 tables (indicated with “_chr*” suffix). Utilizing the sharded tables reduces query costs significantly. Variant Transforms was used to process these VCF files and import them to BigQuery. VEP annotations were parsed into separate columns for easier analysis using Variant Transforms’ annotation support . These public datasets are included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Find out more in our blog post, Providing open access to gnomAD on Google Cloud . Questions? Contact gcp-life-sciences-discuss@googlegroups.com.

  5. D

    Reverse ETL Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Reverse ETL Market Research Report 2033 [Dataset]. https://dataintelo.com/report/reverse-etl-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Reverse ETL Market Outlook



    According to our latest research, the global Reverse ETL market size reached USD 485.7 million in 2024, demonstrating robust momentum driven by the increasing need for operationalizing analytics and democratizing data access across enterprises. The market is projected to grow at a CAGR of 34.6% during the forecast period, reaching a substantial USD 5,392.1 million by 2033. This remarkable growth trajectory is primarily fueled by the rising adoption of cloud-based data integration solutions, the proliferation of business intelligence platforms, and the growing emphasis on real-time data activation for business users.




    One of the key growth factors for the Reverse ETL market is the evolving landscape of data-driven decision-making within organizations. As enterprises increasingly invest in advanced analytics and machine learning platforms, the need to operationalize insights and deliver actionable data directly into business applications has become more critical than ever. Reverse ETL solutions bridge the gap between data warehouses and operational tools, enabling seamless data flow from centralized repositories to customer relationship management (CRM), marketing automation, and sales enablement platforms. This capability empowers business teams to act on insights in real time, enhancing agility, personalization, and overall business performance.




    Another significant driver is the rapid digital transformation witnessed across industries, particularly in sectors such as retail, BFSI, and healthcare. Organizations are leveraging Reverse ETL to unify and activate customer data, optimize marketing campaigns, and streamline sales operations. The shift towards omnichannel engagement and personalized customer experiences has necessitated the integration of analytics outputs into operational workflows. As a result, Reverse ETL platforms are gaining traction as essential components of modern data stacks, supporting use cases ranging from customer segmentation and churn prediction to dynamic pricing and supply chain optimization. This widespread applicability is expected to sustain high demand for Reverse ETL solutions throughout the forecast period.




    The Reverse ETL market is also benefiting from advancements in cloud computing and the rise of data infrastructure modernization initiatives. As enterprises migrate their data warehouses and analytics workloads to the cloud, there is a growing need for scalable, secure, and easy-to-integrate Reverse ETL tools that can handle large volumes of data and support complex transformation logic. Vendors are responding by offering cloud-native solutions with robust APIs, pre-built connectors, and enhanced security features. This trend is particularly pronounced among large enterprises seeking to break down data silos and enable cross-functional collaboration. Furthermore, the emergence of low-code and no-code Reverse ETL platforms is democratizing data access for non-technical business users, further accelerating market growth.




    From a regional perspective, North America continues to dominate the Reverse ETL market, accounting for the largest revenue share in 2024, followed by Europe and Asia Pacific. The strong presence of technology innovators, early adopters, and a mature cloud ecosystem in the United States and Canada has fueled rapid adoption of Reverse ETL solutions. Meanwhile, Asia Pacific is witnessing the fastest growth, driven by digital transformation initiatives in emerging economies such as China and India, as well as increasing investments in cloud infrastructure and analytics capabilities. Europe is also experiencing steady uptake, particularly in highly regulated industries such as BFSI and healthcare, where data privacy and compliance requirements are paramount. Latin America and the Middle East & Africa are gradually catching up, supported by expanding digital infrastructure and growing awareness of data-driven business models.



    Component Analysis



    The component segment of the Reverse ETL market is bifurcated into Software and Services. Software solutions represent the core of the Reverse ETL ecosystem, encompassing platforms and tools that facilitate the extraction, transformation, and loading of data from analytical warehouses into operational systems. These platforms are designed to integrate seamlessly with popular data warehouses such as Snowflake, Google BigQuery, and Amazon Redshift, as well

  6. FitBit Fitness Tracker Data (revised)

    • kaggle.com
    zip
    Updated Dec 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    duart2688 (2022). FitBit Fitness Tracker Data (revised) [Dataset]. https://www.kaggle.com/duart2688/fitabase-data-cleaned-using-sql
    Explore at:
    zip(12763010 bytes)Available download formats
    Dataset updated
    Dec 17, 2022
    Authors
    duart2688
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.

    Main modifications

    This is the list of manipulations performed on the original dataset, published by Möbius. All the cleaning process and rearrangements were performed in BigQuery, using SQL functions. 1) After I took a closer look at the source dataset, I realized that for my case study, I did not need some of the tables contained in the original archive. Therefore, I decided not to import - dailyCalories_merged.csv, - dailyIntensities_merged.csv, - dailySteps_merged.csv. as they proved redundant, their content could be found in the dailyActivity_merged.csv file. In addition, the files - minutesCaloriesWide_merged.csv, - minutesIntensitiesWide_merged.csv, - minuteStepsWide_merged.csv.
    were not imported, as they presented the same data contained in other files in a wide format. Hence, only the files with long format containing the same data were imported in the BigQuery database.

    2) To be able to compare and measure the correlation among different variables based on hourly records, I decided to create a new table based on LEFT JOIN function and columns Id and ActivityHour. I repeated the same JOIN on tables with minute records. Hence I obtained 2 new tables: - hourly_activity.csv, - minute_activity.csv.

    3) To validate most of the columns containing DATE and DATETIME values that were imported as STRING data type, I used the PARSE_DATE() and PARSE_DATETIME() commands. While importing the - heartrate_seconds_merged.csv, - hourlyCalories_merged.csv, - hourlyIntensities_merged.csv, - hourlySteps_merged.csv, - minutesCaloriesNarrow_merged.csv, - minuteIntensitiesNarrow_merged.csv, - minuteMETsNarrow_merged.csv, - minuteSleep_merged.csv, - minuteSteps_merged.csv, - sleepDay_merge.csv, - weigthLog_Info_merged.csv files to BigQuery, it was necessary to import the DATETIME and DATE type columns as STRING, because the original syntax, used in the CSV files, couldn’t be recognized as a correct DATETIME data type, due to “AM” and “PM” text at the end of the expression.

    Acknowlegement

    1. Möbius' version of the data set can be found here.
    2. Furberg, Robert; Brinton, Julia; Keating, Michael ; Ortiz, Alexa https://zenodo.org/record/53894#.YMoUpnVKiP9-
  7. Data and code for: The Contributor Role Taxonomy (CRediT) at ten: a...

    • figshare.com
    pdf
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Porter; Ruth Whittam; Liz Allen; Veronique Kiermer (2025). Data and code for: The Contributor Role Taxonomy (CRediT) at ten: a retrospective analysis of the diversity ofcontributions to published research output [Dataset]. http://doi.org/10.6084/m9.figshare.28816703.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Simon Porter; Ruth Whittam; Liz Allen; Veronique Kiermer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    About this notebookThis notebook was created using a helper script: Base.ipynb. This script has some helper functions that push output directly to Datawrapper to generate the graphs that have been included in the opnion piece. To run without the helper functions and bigquery alone use!pip install google-cloud-bigquerythen add:from google.cloud.bigquery import magicsproject_id = "your_project" # update as neededmagics.context.project = project_idbq_params = {}client = bigquery.Client(project=project_id)%load_ext google.cloud.bigqueryfinally, comment out the make_chart lines.### About dimensions-ai-integrity.ds_dp_pipeline_ripeta_staging.trust_markers_rawdimensions-ai-integrity.ds_dp_pipeline_ripeta_staging.trust_markers_raw is an internal table that is the result of runing a process over the text of publications in order to identify trustmarker segments including authors contributions.The process works as follows:The process aims to automatically segment research papers into their constituent sections. It operates by identifying headings within the text based on a pre-defined set of patterns and a rule-based system. The system first cleans and normalizes the input text. It then employs regular expressions to detect potential section headings. These potential headings are validated against a set of rules that consider factors such as capitalization, the context of surrounding words, and the typical order of sections within a research paper (e.g., certain sections not appearing after "References" or before "Abstract"). Specific rules also handle exceptions for particular heading types like "Keywords" or "Appendices." Once valid headings are identified, the system extracts the corresponding textual content for each section. The output is a structured representation of the paper, categorizing text segments under their respective heading types. Any text that doesn't fall under a recognized heading is also identified as unlabeled content. The overall process aims to provide a structured understanding of the document's organization for subsequent analysis.Author Contributions segments are identified using the following regex:"author_contributions": ["((credit|descript(ion(?:s)?|ive)| )*author(s|'s|ship|s')?( |contribution(?:s)?|statement(?:s)?|role(?:s)?){2,})","contribution(?:s)"]Access to dimensions-ai-integrity.ds_dp_pipeline_ripeta_staging.trust_markers_raw is available to peer reveiwers of the opinion piece.Datasets that allow external validation of the credit ontology process identification process have also been produced.

  8. Intellectual Property Investigations by the USITC

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Intellectual Property Investigations by the USITC [Dataset]. https://www.kaggle.com/bigquery/usitc-investigations
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Section 337, Tariff Act of 1930, Investigations of Unfair Practices in Import Trade. Under section 337, the USITC determines whether there is unfair competition in the importation of products into, or their subsequent sale in, the United States. Section 337 prohibits the importation into the US , or the sale of such articles by owners, importers or consignees, of articles which infringe a patent, copyright, trademark, or semiconductor mask work, or where unfair competition or unfair acts exist that can destroy or substantially injure a US industry or prevent one from developing, or restrain or monopolize trade in US commerce. These latter categories are very broad: unfair competition can involve counterfeit, mismarked or misbranded goods, where the sale of the goods are at unfairly low prices, where other antitrust violations take place such as price fixing, market division or the goods violate a standard applicable to such goods.

    Content

    US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.

    Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:usitc_investigations

    "US International Trade Commission 337Info Unfair Import Investigations Information System" by the USITC, for public use.

    Banner photo by João Silas on Unsplash

  9. Stack Overflow Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stack Overflow (2019). Stack Overflow Data [Dataset]. https://www.kaggle.com/datasets/stackoverflow/stackoverflow
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    Stack Overflowhttp://stackoverflow.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Context

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.

    Content

    Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    Dataset Source: https://archive.org/download/stackexchange

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow

    https://cloud.google.com/bigquery/public-data/stackoverflow

    Banner Photo by Caspar Rubin from Unplash.

    Inspiration

    What is the percentage of questions that have been answered over the years?

    What is the reputation and badge count of users across different tenures on StackOverflow?

    What are 10 of the “easier” gold badges to earn?

    Which day of the week has most questions answered within an hour?

  10. Hourly Crypto & Stocks Market Data

    • kaggle.com
    zip
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adrian Julius Aluoch (2025). Hourly Crypto & Stocks Market Data [Dataset]. https://www.kaggle.com/datasets/adrianjuliusaluoch/hourly-crypto-stocks-market-data/discussion
    Explore at:
    zip(6626680 bytes)Available download formats
    Dataset updated
    Nov 15, 2025
    Authors
    Adrian Julius Aluoch
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains hourly cryptocurrency and stock market data collected from coingecko starting in March 2025. The collection pipeline was designed to demonstrate practical data management and automation skills: - Data Ingestion: A Python Script automatically scrapes fresh hourly data from coingecko and writes it into Google Sheets. - Data Offloading: To avoid Google Sheets’ row limitations, Python scripts periodically export data from Sheets into Google BigQuery. - Data Publishing: The data is shared to Kaggle via a scheduled notebook, ensuring the dataset is updated daily with the latest available records.

    This setup provides a reliable, reproducible data stream that can be used for: - Practicing SQL queries for data extraction, filtering, and preparation before analysis - Exploratory data analysis of crypto and stock price movements - Building time-series forecasting models - Studying correlations between global assets - Demonstrating real-world ETL (Extract, Transform, Load) and data pipeline engineering

    The dataset is continuously updated hourly, making it suitable both for live monitoring and historical trend analysis.

  11. Kimia Farma: Performance Analysis 2020-2023

    • kaggle.com
    zip
    Updated Feb 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anggun Dwi Lestari (2025). Kimia Farma: Performance Analysis 2020-2023 [Dataset]. https://www.kaggle.com/datasets/anggundwilestari/kimia-farma-performance-analysis-2020-2023
    Explore at:
    zip(30284703 bytes)Available download formats
    Dataset updated
    Feb 27, 2025
    Authors
    Anggun Dwi Lestari
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19062145%2F025ccf521f62db512b4a98edd0b3508a%2FKimia_Farma_Dashboard.jpg?generation=1748428094441761&alt=media" alt="">This project analyzes Kimia Farma's performance from 2020 to 2023 using Google Looker Studio. The analysis is based on a pre-processed dataset stored in BigQuery, which serves as the data source for the dashboard.

    Project Scope

    The dashboard is designed to provide insights into branch performance, sales trends, customer ratings, and profitability. The development is ongoing, with multiple pages planned for a more in-depth analysis.

    Current Progress

    ✅ The first page of the dashboard is completed
    ✅ A sample dashboard file is available on Kaggle
    🔄 Development will continue with additional pages

    Dataset Overview

    The dataset consists of transaction records from Kimia Farma branches across different cities and provinces. Below are the key columns used in the analysis: - transaction_id: Transaction ID code - date: Transaction date - branch_id: Kimia Farma branch ID code - branch_name: Kimia Farma branch name - kota: City of the Kimia Farma branch - provinsi: Province of the Kimia Farma branch - rating_cabang: Customer rating of the Kimia Farma branch - customer_name: Name of the customer who made the transaction - product_id: Product ID code - product_name: Name of the medicine - actual_price: Price of the medicine - discount_percentage: Discount percentage applied to the medicine - persentase_gross_laba: Gross profit percentage based on the following conditions:
    Price ≤ Rp 50,000 → 10% profit
    Price > Rp 50,000 - 100,000 → 15% profit
    Price > Rp 100,000 - 300,000 → 20% profit
    Price > Rp 300,000 - 500,000 → 25% profit
    Price > Rp 500,000 → 30% profit
    - nett_sales: Price after discount - nett_profit: Profit earned by Kimia Farma - rating_transaksi: Customer rating of the transaction

    Files Provided

    📌 kimia farma_query.txt – Contains SQL queries used for data analysis in Looker Studio
    📌 kimia farma_analysis_table.csv – Preprocessed dataset ready for import and analysis

    📢 Published on : My LinkedIn

  12. d

    Meio Ambiente: Taxa de Precipitação (GOES-16)

    • data.rio
    • datario-pcrj.hub.arcgis.com
    • +1more
    Updated Jun 3, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Meio Ambiente: Taxa de Precipitação (GOES-16) [Dataset]. https://www.data.rio/documents/48c0210e96074b48b401ec2fa4ad99b3
    Explore at:
    Dataset updated
    Jun 3, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Taxa de precipitação estimada de áreas do sudeste brasileiro. As estimativas são feitas de hora em hora, cada registro contendo dados desta estimativa. Cada área é um quadrado formado por 4km de lado. Dados coletados pelo satélite GOES-16.

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.meio_ambiente_clima.taxa_precipitacao_satelite`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      Desde 2020 até a data corrente
    
    
    
    
      Frequência de atualização
    
    
      Diário
    
    
    
    
      Órgão gestor
    
    
      Centro de Operações da Prefeitura do Rio (COR)
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
            latitude
            Latitude do centro da área.
    
    
    
            longitude
            Longitude do centro da área.
    
    
    
            rrqpe
            Taxa de precipitação estimada, medidas em milímetros por hora.
    
    
    
            primary_key
            Chave primária criada a partir da concatenação da coluna data, horário, latitude e longitude. Serve para evitar dados duplicados.
    
    
    
            horario
            Horário no qual foi realizada a medição
    
    
    
            data_particao
            Data na qual foi realizada a medição
    
    
    
    
    
    
    
      Dados do publicador
    
    
      Nome: Patrícia Catandi
      E-mail: patriciabcatandi@gmail.com
    
  13. d

    Meio Ambiente: Estações pluviométricas (AlertaRio)

    • data.rio
    • datario-pcrj.hub.arcgis.com
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Meio Ambiente: Estações pluviométricas (AlertaRio) [Dataset]. https://www.data.rio/documents/cc4863712d65418abd8b2063a50bf453
    Explore at:
    Dataset updated
    Jun 2, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Dados sobre as estações pluviométricas do alertario ( Sistema Alerta Rio da Prefeitura do Rio de Janeiro ) na cidade do Rio de Janeiro.

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou,
      para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.meio_ambiente_clima.estacoes_alertario`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.meio_ambiente_clima.estacoes_alertario` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.meio_ambiente_clima.estacoes_alertario` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      N/A
    
    
    
    
      Frequência de atualização
    
    
      Anual
    
    
    
    
      Órgão gestor
    
    
      COR
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
          x
          X UTM (SAD69 Zona 23)
    
    
    
          longitude
          Longitude onde a estação se encontra.
    
    
    
          id_estacao
          ID da estação definido pelo AlertaRIO.
    
    
    
          estacao
          Nome da estação.
    
    
    
          latitude
          Latitude onde a estação se encontra.
    
    
    
          cota
          Altura em metros onde a estação se encontra.
    
    
    
          endereco
          Endereço completo da estação.
    
    
    
          situacao
          Indica se a estação está operante ou com falha.
    
    
    
          data_inicio_operacao
          Data em que a estação começou a operar.
    
    
    
          data_fim_operacao
          Data em que a estação parou de operar.
    
    
    
          data_atualizacao
          Última data em que os dados sobre a data de operação foram atualizados.
    
    
    
          y
          Y UTM (SAD69 Zona 23)
    
    
    
    
    
    
    
      Dados do publicador
    
    
      Nome: Patricia Catandi
      E-mail: patriciabcatandi@gmail.com
    
  14. OpenAQ

    • kaggle.com
    zip
    Updated Dec 1, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open AQ (2017). OpenAQ [Dataset]. https://www.kaggle.com/datasets/open-aq/openaq
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Dec 1, 2017
    Dataset authored and provided by
    Open AQ
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    OpenAQ is an open-source project to surface live, real-time air quality data from around the world. Their “mission is to enable previously impossible science, impact policy and empower the public to fight air pollution.” The data includes air quality measurements from 5490 locations in 47 countries.

    Scientists, researchers, developers, and citizens can use this data to understand the quality of air near them currently. The dataset only includes the most current measurement available for the location (no historical data).

    Update Frequency: Weekly

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.openaq.[TABLENAME]. Fork this kernel to get started.

    Acknowledgements

    Dataset Source: openaq.org

    Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source and is provided "AS IS" without any warranty, express or implied.

  15. d

    Dados do sistema Comando (COR): procedimento operacional padrao

    • data.rio
    • hub.arcgis.com
    Updated Oct 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Dados do sistema Comando (COR): procedimento operacional padrao [Dataset]. https://www.data.rio/documents/b26f700285ab4fde9495f2851adcf3d8
    Explore at:
    Dataset updated
    Oct 4, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Procedimentos operacionais padrões (POP) existentes na PCRJ. Um POP é um procedimento que será usado para solucionar um evento. Um POP é composto de várias atividades. Um evento é uma ocorrência na cidade do Rio de Janeiro que exija um acompanhamento e na maioria das vezes uma ação da PCRJ, como por exemplo um buraco na rua. Acesse também através da API do Escritório de Dados: https://api.dados.rio/v1/

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.adm_cor_comando.procedimento_operacional_padrao`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.adm_cor_comando.procedimento_operacional_padrao` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.adm_cor_comando.procedimento_operacional_padrao` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      Não informado.
    
    
    
    
      Frequência de atualização
    
    
      Mensal
    
    
    
    
      Órgão gestor
    
    
      COR
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
            id_pop
            Identificador do POP procedimento operacional padrão).
    
    
    
            pop_titulo
            Nome do procedimento operacional padrão.
    
    
    
    
    
    
    
      Dados do(a) publicador(a)
    
    
      Nome: Patrícia Catandi
      E-mail: patriciabcatandi@gmail.com
    
  16. d

    Transporte Rodoviário: Viagens dos ônibus identificadas por GPS

    • data.rio
    • datario-pcrj.hub.arcgis.com
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2023). Transporte Rodoviário: Viagens dos ônibus identificadas por GPS [Dataset]. https://www.data.rio/documents/eb63f575dc25425e87aea20d3c6a4f6d
    Explore at:
    Dataset updated
    Jun 30, 2023
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Detalhes de todas as viagens identificadas por GPS. O algorítimo para a identificação das viagens está disponível em: https://github.com/prefeitura-rio/queries-rj-smtr/tree/master/models/projeto_subsidio_sppo

    Com base nesses dados é realizado o pagamento do subsidio de transportes levando em conta o cumprimento da quilometragem total planejada dos serviços. Veja mais detalhes em: https://transportes.prefeitura.rio/subsidio/

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.transporte_rodoviario_municipal.viagem_onibus`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.transporte_rodoviario_municipal.viagem_onibus` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.transporte_rodoviario_municipal.viagem_onibus` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      01/06/2022 até o momento
    
    
    
    
      Frequência de atualização
    
    
      Diária
    
    
    
      Órgão gestor
    
    
      Secretaria Municipal de Transportes
    
    
    
    
      Colunas
    
    
    
    
          Nome da Tabela
          Nome da Coluna
          Descrição
    
    
          viagem_onibus
          data
          Data da viagem
    
    
          viagem_onibus
          consorcio
          Consórcio ao qual o serviço pertence
    
    
          viagem_onibus
          tipo_dia
          Dia da semana considerado para o cálculo da distância planejada - categorias: Dia Útil, Sábado, Domingo
    
    
          viagem_onibus
          id_empresa
          Código identificador da empresa que opera o veículo
    
    
          viagem_onibus
          id_veiculo
          Código identificador do veículo (número de ordem)
    
    
          viagem_onibus
          id_viagem
          Código identificador da viagem (ex: id_veiculo + servico + sentido + shape_id + datetime_partida)
    
    
          viagem_onibus
          servico
          Serviço realizado pelo veículo (com base na identificação do trajeto)
    
    
          viagem_onibus
          shape_id
          Código identificador do trajeto (shape) operado
    
    
          viagem_onibus
          sentido
          Sentido do trajeto identificado - categorias: I (ida), V (volta), C (circular)
    
    
          viagem_onibus
          datetime_partida
          Horário de início da viagem
    
    
          viagem_onibus
          datetime_chegada
          Horário de fim da viagem
    
    
          viagem_onibus
          tempo_viagem
          Tempo aferido da viagem (em minutos)
    
    
          viagem_onibus
          distancia_planejada
          Distância do shape (trajeto) planejado
    
    
          viagem_onibus
          perc_conformidade_shape
          Percentual de sinais emitidos dentro do shape (trajeto) ao longo da viagem
    
    
          viagem_onibus
          perc_conformidade_registros
          Percentual de minutos da viagem com registro de sinal de GPS
    
    
          viagem_onibus
          versao_modelo
          Versão da metodologia de cálculo da respectiva linha na tabela
    
    
    
    
    
    
    
      Dados do(a) publicador(a)
    
    
      Nome: Subsecretaria de Tecnologia em Transportes (SUBTT)
      E-mail: dados.smtr@prefeitura.rio
    
  17. NOAA GSOD

    • kaggle.com
    zip
    Updated Aug 30, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA (2019). NOAA GSOD [Dataset]. https://www.kaggle.com/datasets/noaa/gsod
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Aug 30, 2019
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Authors
    NOAA
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries.

    Content

    Over 9000 stations' data are typically available.

    The daily elements included in the dataset (as available from each station) are: Mean temperature (.1 Fahrenheit) Mean dew point (.1 Fahrenheit) Mean sea level pressure (.1 mb) Mean station pressure (.1 mb) Mean visibility (.1 miles) Mean wind speed (.1 knots) Maximum sustained wind speed (.1 knots) Maximum wind gust (.1 knots) Maximum temperature (.1 Fahrenheit) Minimum temperature (.1 Fahrenheit) Precipitation amount (.01 inches) Snow depth (.1 inches)

    Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.

    Acknowledgements

    This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and present, collected from over 9000 stations. Dataset Source: NOAA

    Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Photo by Allan Nygren on Unsplash

  18. d

    Dados do sistema Comando (COR): ocorrencias

    • data.rio
    • datario-pcrj.hub.arcgis.com
    • +2more
    Updated Oct 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Dados do sistema Comando (COR): ocorrencias [Dataset]. https://www.data.rio/documents/f9ddaeb4ac754975846716f084645f3d
    Explore at:
    Dataset updated
    Oct 4, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Ocorrências disparadas pelo COR desde 2015. Uma ocorrência na cidade do Rio de Janeiro é um acontecimento que exije um acompanhamento e, na maioria das vezes, uma ação da PCRJ. Por exemplo, Buraco na pista, bolsão d'água, enguiço mecânico. Uma ocorrência aberta é uma ocorrência que ainda não foi solucionada. Acesse também através da API do Escritório de Dados: https://api.dados.rio/v1/

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.adm_cor_comando.ocorrencias`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.adm_cor_comando.ocorrencias` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.adm_cor_comando.ocorrencias` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      Não informado.
    
    
    
    
      Frequência de atualização
    
    
      Diário
    
    
    
    
      Órgão gestor
    
    
      COR
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
            data_inicio
            Data e hora do registro do evento na PCRJ.
    
    
    
            data_fim
            Data e hora do encerramento do evento na PCRJ. O evento é encerrado quando é solucionado. Este atributo está vazio quanto o evento está aberto.
    
    
    
            bairro
            Bairro onde ocorreu o evento.
    
    
    
            id_pop
            Identificador do POP.
    
    
    
            status
            Status do evento (ABERTO, FECHADO).
    
    
    
            gravidade
            Gravidade do evento (BAIXO, MEDIO, ALTO, CRITICO).
    
    
    
            prazo
            Prazo esperado de solução do evento (CURTO, MEDIO(acima de 3 dias), LONGO( acima de 5 dias)).
    
    
    
            latitude
            Latitude em formato WGS-84 em que ocorreu o evento
    
    
    
            longitude
            Longitude em formato WGS-84 em que ocorreu o evento
    
    
    
            id_evento
            Identificador do evento.
    
    
    
            descricao
            Descrição do evento.
    
    
    
            tipo
            Tipo do evento (PRIMARIO, SECUNDARIO)
    
    
    
    
    
    
    
      Dados do(a) publicador(a)
    
    
      Nome: Patrícia Catandi
      E-mail: patriciabcatandi@gmail.com
    
  19. Chicago Crime

    • kaggle.com
    zip
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashkan Ranjbar (2025). Chicago Crime [Dataset]. https://www.kaggle.com/ashkanranjbar/chicago-crime
    Explore at:
    zip(10641044 bytes)Available download formats
    Dataset updated
    Nov 19, 2025
    Authors
    Ashkan Ranjbar
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Area covered
    Chicago
    Description

    This dataset has gained popularity over time and is widely known. While Kaggle courses teach how to use Google BigQuery to extract a sample from it, this notebook provides a HOW-TO guide to access the dataset directly within your own notebook. Instead of uploading the entire dataset here, which is quite large, I offer several alternatives to work with a smaller portion of it. My main focus was to demonstrate various techniques to make the dataset more manageable on your own laptop, ensuring smoother operations. Additionally, I've included some interesting insights on basic descriptive statistics and even a modeling example, which can be further explored based on your preferences. I intend to revisit and refine it in the near future to enhance its rigor. Meanwhile, I welcome any suggestions to improve the notebook!

    Here are the columns that I have chosen to include (after carefully eliminating a few others):

    • Date: This column represents the timestamp of the incident. From this column, I have extracted the Month, Day, and Hour information. We can also add additional time-based columns such as Week and Day of the Week, among others.
    • Block: This column provides a partially redacted address where the incident occurred, indicating the same block as the actual address.
    • IUCR: The acronym stands for Illinois Uniform Crime Reporting. This code is directly linked to the Primary Type and Description. You can find more information about it in this link.
    • Primary Type: This column describes the primary category of the IUCR code mentioned above.
    • Description: This column provides a secondary description of the IUCR code, serving as a subcategory of the primary description.
    • Location Description: Here, you can find the description of the location where the incident took place.
    • Arrest: This column indicates whether an arrest was made in relation to the incident.
    • Domestic: It shows whether the incident was domestic-related, as defined by the Illinois Domestic Violence Act.
    • Beat: The beat refers to the smallest police geographic area, with each beat having a dedicated territory. You can find more information about it in this link.
    • District: This column represents the police district where the incident occurred.
    • Ward: It refers to the number that labels the City Council district where the incident took place.
    • Community Areas: This column indicates the community area where the incident occurred. Chicago has a total of 77 community areas.
    • FBI Code: The crime classification outlined in the FBI's National Incident-Based Reporting System (NIBRS).
    • X-Coordinate, Y-Coordinate, Latitude, Longitude, Location: These columns provide information about the geographical coordinates of the incident location, including latitude and longitude. The "Location" column contains just the latitude and longitude coordinates.
    • Year, Updated On: These columns represent the year of the incident and the date on which the dataset was last updated.

    Feel free to explore the notebook and provide any suggestions for improvement. Your feedback is highly appreciated!

  20. a

    Administração de Serviços Públicos: Chamados feitos ao 1746

    • datario-pcrj.hub.arcgis.com
    • data.rio
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Administração de Serviços Públicos: Chamados feitos ao 1746 [Dataset]. https://datario-pcrj.hub.arcgis.com/documents/52b6bd003abf4b8995ec9860e65a82c5
    Explore at:
    Dataset updated
    Jun 2, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Chamados feitos ao 1746. São chamados desde março de 2011, quando começou o projeto 1746.

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.administracao_servicos_publicos.chamado_1746`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.administracao_servicos_publicos.chamado_1746` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.administracao_servicos_publicos.chamado_1746` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      Março de 2011
    
    
    
    
      Frequência de atualização
    
    
      Diário
    
    
    
    
      Órgão gestor
    
    
      SEGOVI
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
            id_chamado
            Identificador único do chamado no banco de dados.
    
    
    
            data_inicio
            Data de abertura do chamado. Ocorre quando o operador registra o chamado.
    
    
    
            data_fim
            Data de fechamento do chamado. O chamado é fechado quando o pedido é atendido ou quando se percebe que o pedido não pode ser atendido.
    
    
    
            id_bairro
            Identificador único, no banco de dados, do bairro onde ocorreu o fato que gerou o chamado.
    
    
    
            id_territorialidade
            Identificador único, no banco de dados, da territorialidade onde ocorreu o fato que gerou o chamado. Territorialidade é uma região da cidade do Rio de Janeiro que tem com responsável um órgão especifico. Exemplo: CDURP, que é responsável pela região do porto do Rio de Janeiro.
    
    
    
            id_logradouro
            Identificador único, no banco de dados, do logradouro onde ocorreu o fato que gerou o chamado.
    
    
    
            numero_logradouro
            Número da porta onde ocorreu o fato que gerou o chamado.
    
    
    
            id_unidade_organizacional
            Identificador único, no banco de dados, do órgão que executa o chamado. Por exemplo: identificador da COMLURB quando o chamado é relativo a limpeza urbana.
    
    
    
            nome_unidade_organizacional
            Nome do órgão que executa a demanda. Por exemplo: COMLURB quando a demanda é relativa a limpeza urbana.
    
    
    
            unidade_organizadional_ouvidoria
            Booleano indicando se o chamado do cidadão foi feita Ouvidoria ou não. 1 caso sim, 0 caso não,
    
    
    
            categoria
            Categoria do chamado. Exemplo: Serviço, informação, sugestão, elogio, reclamação, crítica.
    
    
    
            id_tipo
            Identificador único, no banco de dados, do tipo do chamado. Ex: Iluminação pública.
    
    
    
            tipo
            Nome do tipo do chamado. Ex: Iluminação pública.
    
    
    
            id_subtipo
            Identificador único, no banco de dados, do subtipo do chamado. Ex: Reparo de lâmpada apagada.
    
    
    
            subtipo
            Nome do subtipo do chamado. Ex: Reparo de lâmpada apagada.
    
    
    
            status
            Status do chamado. Ex. Fechado com solução, aberto em andamento, pendente etc.
    
    
    
            longitude
            Longitude do lugar do evento que motivou o chamado.
    
    
    
            latitude
            Latitude do lugar do evento que motivou o chamado.
    
    
    
            data_alvo_finalizacao
            Data prevista para o atendimento do chamado. Caso prazo_tipo seja D fica em branco até o diagnóstico ser feito.
    
    
    
            data_alvo_diagnostico
            Data prevista para fazer o diagnóstico do serviço. Caso prazo_tipo seja F esta data fica em branco.
    
    
    
            data_real_diagnostico
            Data em que foi feito o diagnóstico do serviço. Caso prazo_tipo seja F esta data fica em branco.
    
    
    
            tempo_prazo
            Prazo para o serviço ser feito. Em dias ou horas após a abertura do chamado. Caso haja diagnóstico o prazo conta após se fazer o diagnóstico.
    
    
    
            prazo_unidade
            Unidade de tempo utilizada no prazo. Dias ou horas. D ou H.
    
    
    
            prazo_tipo
            Diagnóstico ou finalização. D ou F. Indica se a chamada precisa de diagnóstico ou não. Alguns serviços precisam de avaliação para serem feitos, neste caso é feito o diagnóstico. Por exemplo, pode de árvore. Há a necessidade de um engenheiro ambiental verificar a necessidade da poda ou não.
    
    
    
            id_unidade_organizacional_mae
            ID da unidade organizacional mãe do orgão que executa a demanda. Por exemplo: "CVA - Coordenação de Vigilância de Alimentos" é quem executa a demanda e obede a unidade organizacional mãe "IVISA-RIO - Instituto Municipal de Vigilância Sanitária, de Zoonoses e de Inspeção Agropecuária". A coluna se refere ao ID deste último.
    
    
    
            situacao
            Identifica se o chamado foi encerrado
    
    
    
            tipo_situacao
            Indica o status atual do chamado entre as categorias Atendido, Atendido parcialmente, Não atendido, Não constatado e Andamento
    
    
    
            dentro_prazo
            Indica se a data alvo de finalização do chamado ainda está dentro do prazo estipulado.
    
    
    
            justificativa_status
            Justificativa que os órgãos usam ao definir o status. Exemplo: SEM POSSIBILIDADE DE ATENDIMENTO - justificativa: Fora de área de atuação do municipio
    
    
    
            reclamacoes
            Quantidade de reclamações.
    
    
    
    
    
    
    
      Dados do(a) publicador(a)
    
    
      Nome: Patricia Catandi
      E-mail: patriciabcatandi@gmail.com
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/datasets/bigquery/patents
Organization logoOrganization logo

Google Patents Public Data

Worldwide bibliographic and US patent publications (BigQuery)

Explore at:
185 scholarly articles cite this dataset (View in Google Scholar)
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2018
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

Content

The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

Acknowledgements

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

Banner photo by Helloquence on Unsplash

Search
Clear search
Close search
Google apps
Main menu