6 datasets found
  1. Google Patents Public Data

    • kaggle.com
    zip
    Updated Sep 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/bigquery/patents
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2018
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

    Content

    The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

    For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

    “Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

    Banner photo by Helloquence on Unsplash

  2. COKI Language Dataset

    • zenodo.org
    application/gzip, csv
    Updated Jun 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon (2022). COKI Language Dataset [Dataset]. http://doi.org/10.5281/zenodo.6636625
    Explore at:
    application/gzip, csvAvailable download formats
    Dataset updated
    Jun 16, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The COKI Language Dataset contains predictions for 122 million academic publications. The dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

    Methodology
    A subset of the COKI Academic Observatory Dataset, which is produced by the Academic Observatory Workflows codebase [1], was extracted and converted to CSV with Bigquery and downloaded to a virtual machine. The subset consists of all publications with DOIs in our dataset, including each publication’s title and abstract from both Crossref Metadata and Microsoft Academic Graph. The CSV files were then processed with a Python script. The titles and abstracts for each record were pre-processed, concatenated together and analysed with fastText. The titles and abstracts from Crossref Metadata were used first, with the MAG titles and abstracts serving as a fallback when the Crossref Metadata information was empty. Language was predicted for each publication using the fastText lid.176.bin language identification model [2]. fastText was chosen because of its high accuracy and fast runtime speed [3]. The final output dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

    Query or Download
    The data is publicly accessible in BigQuery in the following two tables:

    When you make queries on these tables, make sure that you are in your own Google Cloud project, otherwise the queries will fail.

    See the COKI Language Detection README for instructions on how to download the data from Zenodo and load it into BigQuery.

    Code
    The code that generated this dataset, the BigQuery schemas and instructions for loading the data into BigQuery can be found here: https://github.com/The-Academic-Observatory/coki-language

    License
    COKI Language Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

    Attributions
    This work contains information from:

    References
    [1] https://doi.org/10.5281/zenodo.6366695
    [2] https://fasttext.cc/docs/en/language-identification.html
    [3] https://modelpredict.com/language-identification-survey

  3. gnomAD

    • console.cloud.google.com
    Updated Jun 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Broad%20Institute%20of%20MIT%20and%20Harvard&inv=1&invt=Ab29og (2020). gnomAD [Dataset]. https://console.cloud.google.com/marketplace/product/broad-institute/gnomad
    Explore at:
    Dataset updated
    Jun 23, 2020
    Dataset provided by
    Googlehttp://google.com/
    Description

    The Genome Aggregation Database (gnomAD) is maintained by an international coalition of investigators to aggregate and harmonize data from large-scale sequencing projects. These public datasets are available in VCF format in Google Cloud Storage and in Google BigQuery as integer range partitioned tables . Each dataset is sharded by chromosome meaning variants are distributed across 24 tables (indicated with “_chr*” suffix). Utilizing the sharded tables reduces query costs significantly. Variant Transforms was used to process these VCF files and import them to BigQuery. VEP annotations were parsed into separate columns for easier analysis using Variant Transforms’ annotation support . These public datasets are included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Find out more in our blog post, Providing open access to gnomAD on Google Cloud . Questions? Contact gcp-life-sciences-discuss@googlegroups.com.

  4. Intellectual Property Investigations by the USITC

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Intellectual Property Investigations by the USITC [Dataset]. https://www.kaggle.com/bigquery/usitc-investigations
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Section 337, Tariff Act of 1930, Investigations of Unfair Practices in Import Trade. Under section 337, the USITC determines whether there is unfair competition in the importation of products into, or their subsequent sale in, the United States. Section 337 prohibits the importation into the US , or the sale of such articles by owners, importers or consignees, of articles which infringe a patent, copyright, trademark, or semiconductor mask work, or where unfair competition or unfair acts exist that can destroy or substantially injure a US industry or prevent one from developing, or restrain or monopolize trade in US commerce. These latter categories are very broad: unfair competition can involve counterfeit, mismarked or misbranded goods, where the sale of the goods are at unfairly low prices, where other antitrust violations take place such as price fixing, market division or the goods violate a standard applicable to such goods.

    Content

    US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.

    Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:usitc_investigations

    "US International Trade Commission 337Info Unfair Import Investigations Information System" by the USITC, for public use.

    Banner photo by João Silas on Unsplash

  5. d

    Transporte Rodoviário: Histórico de GPS do BRT

    • data.rio
    • datariov2-pcrj.hub.arcgis.com
    Updated Jun 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Transporte Rodoviário: Histórico de GPS do BRT [Dataset]. https://www.data.rio/documents/a17608e589864376bfad313e026c4681
    Explore at:
    Dataset updated
    Jun 8, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Dados históricos de posição geográfica de veículos do BRT.

    Dados completos disponíveis para consulta e download no data lake do data.rio. Os dados são capturados a cada minuto e tratados a cada hora. Dados sujeitos a alteração, como correções de buracos de captura e/ou ajustes de tratamento.

    Como acessar Nessa página

    Aqui, você encontrará um botão para realizar o download dos dados em formato
    CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar
    aqui.
    

    BigQuery

     SELECT 
     * 
     FROM 
    
     `datario.transporte_rodoviario_municipal.gps_brt`
    
     LIMIT 
     1000 
    
    
    Clique
    aqui
    para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência
    com BigQuery, acesse
    nossa documentação
    para entender como acessar os dados.
    

    Python

    import
    basedosdados
    as
    bd
    
    
    # Para carregar o dado direto no pandas
    
    df
    =
    bd.read_sql
    (
    "SELECT * FROM `datario.transporte_rodoviario_municipal.gps_brt` LIMIT
     1000"
    ,
    billing_project_id
    =
    "<id_do_seu_projeto_gcp>"
    )
    

    R

    install.packages(
    "basedosdados"
    )
    
    library(
    "basedosdados"
    )
    
    
    # Defina o seu projeto no Google Cloud
    
    set_billing_id(
    "<id_do_seu_projeto_gcp>"
    )
    
    
    # Para carregar o dado direto no R
    
    tb <- read_sql(
    "SELECT * FROM `datario.transporte_rodoviario_municipal.gps_brt` LIMIT
     1000"
    )
    

    Cobertura temporal 24/11/2021 até o momento

    Frequência de atualização Horária

    Órgão gestor Secretaria Municipal de Transportes (SMTR)

    Colunas

       Nome
    
    
       Descrição
    
    
    
    
       modo
    
    
       BRT – nesta tabela consta apenas este modo
    
    
    
    
       timestamp_gps
    
    
       Timestamp de emissão do sinal de GPS
    
    
    
    
       data
    
    
       Data do timestamp de emissão do sinal de GPS
    
    
    
    
       hora
    
    
       Hora do timestamp de emissão do sinal de GPS
    
    
    
    
       id_veiculo
    
    
       Código identificador do veículo (número de ordem).
    
    
    
    
       servico
    
    
       Serviço realizado pelo veículo.
    
    
    
    
       latitude
    
    
       Parte da coordenada geográfica (eixo y) em graus decimais (EPSG:4326 -
       WGS84)
    
    
    
    
       longitude
    
    
       Parte da coordenada geográfica (eixo x) em graus decimais (EPSG:4326 -
       WGS84)
    
    
    
    
       flag_em_movimento
    
    
       Veículos com 'velocidade' abaixo da 'velocidade_limiar_parado', são
       considerados como parado (false). Caso contrário, são considerados
       andando (true)
    
    
    
    
       tipo_parada
    
    
       Identifica veículos parados em terminais ou garagens.
    
    
    
    
       flag_linha_existe_sigmob
    
    
       Flag de verificação se a linha informada existe no SIGMOB.
    
    
    
    
       velocidade_instantanea
    
    
       Velocidade instantânea do veículo, conforme informado pelo GPS
       (km/h)
    
    
    
    
       velocidade_estimada_10_min
    
    
       Velocidade média nos últimos 10 minutos de operação (km/h)
    
    
    
    
       distancia
    
    
       Distância da última posição do GPS em relação à posição atual (m)
    
    
    
    
       versao
    
    
       Código de controle de versão do dado (SHA Github)
    

    Dados do(a) publicador(a)

    Nome:
    Subsecretaria de Tecnologia em Transportes (SUBTT)
    E-mail:
    dados.smtr@prefeitura.rio
    
  6. d

    Meio Ambiente: Taxa de Precipitação (GOES-16)

    • data.rio
    • hub.arcgis.com
    Updated Jun 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prefeitura da Cidade do Rio de Janeiro (2022). Meio Ambiente: Taxa de Precipitação (GOES-16) [Dataset]. https://www.data.rio/documents/48c0210e96074b48b401ec2fa4ad99b3
    Explore at:
    Dataset updated
    Jun 2, 2022
    Dataset authored and provided by
    Prefeitura da Cidade do Rio de Janeiro
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Taxa de precipitação estimada de áreas do sudeste brasileiro. As estimativas são feitas de hora em hora, cada registro contendo dados desta estimativa. Cada área é um quadrado formado por 4km de lado. Dados coletados pelo satélite GOES-16.

      Como acessar
    
    
      Nessa página
    
    
      Aqui, você encontrará um botão para realizar o download dos dados em formato CSV e compactados com gzip. Ou, para mesmo resultado, pode clicar aqui.
    
    
      BigQuery
    
    
    
    
          SELECT
    
    
          *
    
    
          FROM
    
    
          `datario.meio_ambiente_clima.taxa_precipitacao_satelite`
    
    
          LIMIT
    
    
          1000
    
    
    
    
      Clique aqui
      para ir diretamente a essa tabela no BigQuery. Caso não tenha experiência com BigQuery,
      acesse nossa documentação para entender como acessar os dados.
    
    
      Python
    
    
    
        import
        basedosdados
        as
        bd
    
    
        # Para carregar o dado direto no pandas
    
        df
        =
        bd.read_sql
        (
        "SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
        ,
        billing_project_id
        =
        "<id_do_seu_projeto_gcp>"
        )
    
    
    
    
      R
    
    
    
        install.packages(
        "basedosdados"
        )
    
        library(
        "basedosdados"
        )
    
    
        # Defina o seu projeto no Google Cloud
    
        set_billing_id(
        "<id_do_seu_projeto_gcp>"
        )
    
    
        # Para carregar o dado direto no R
    
        tb <- read_sql(
        "SELECT * FROM `datario.meio_ambiente_clima.taxa_precipitacao_satelite` LIMIT 1000"
        )
    
    
    
    
    
    
      Cobertura temporal
    
    
      Desde 2020 até a data corrente
    
    
    
    
      Frequência de atualização
    
    
      Diário
    
    
    
    
      Órgão gestor
    
    
      Centro de Operações da Prefeitura do Rio (COR)
    
    
    
    
      Colunas
    
    
    
        Nome
        Descrição
    
    
    
    
            latitude
            Latitude do centro da área.
    
    
    
            longitude
            Longitude do centro da área.
    
    
    
            rrqpe
            Taxa de precipitação estimada, medidas em milímetros por hora.
    
    
    
            primary_key
            Chave primária criada a partir da concatenação da coluna data, horário, latitude e longitude. Serve para evitar dados duplicados.
    
    
    
            horario
            Horário no qual foi realizada a medição
    
    
    
            data_particao
            Data na qual foi realizada a medição
    
    
    
    
    
    
    
      Dados do publicador
    
    
      Nome: Patrícia Catandi
      E-mail: patriciabcatandi@gmail.com
    
  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/bigquery/patents
Organization logoOrganization logo

Google Patents Public Data

Worldwide bibliographic and US patent publications (BigQuery)

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

Content

The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

Acknowledgements

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

Banner photo by Helloquence on Unsplash

Search
Clear search
Close search
Google apps
Main menu