4 datasets found
  1. BigQuery GIS Utility Datasets (U.S.)

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). BigQuery GIS Utility Datasets (U.S.) [Dataset]. https://www.kaggle.com/bigquery/utility-us
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Querying BigQuery tables You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME].

    • Project: "bigquery-public-data"
    • Table: "utility_us"

    Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.

    If you're using Python, you can start with this code:

    import pandas as pd
    from bq_helper import BigQueryHelper
    bq_assistant = BigQueryHelper("bigquery-public-data", "utility_us")
    
  2. FitBit Fitness Tracker Data (revised)

    • kaggle.com
    zip
    Updated Dec 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    duart2688 (2022). FitBit Fitness Tracker Data (revised) [Dataset]. https://www.kaggle.com/duart2688/fitabase-data-cleaned-using-sql
    Explore at:
    zip(12763010 bytes)Available download formats
    Dataset updated
    Dec 17, 2022
    Authors
    duart2688
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.

    Main modifications

    This is the list of manipulations performed on the original dataset, published by Möbius. All the cleaning process and rearrangements were performed in BigQuery, using SQL functions. 1) After I took a closer look at the source dataset, I realized that for my case study, I did not need some of the tables contained in the original archive. Therefore, I decided not to import - dailyCalories_merged.csv, - dailyIntensities_merged.csv, - dailySteps_merged.csv. as they proved redundant, their content could be found in the dailyActivity_merged.csv file. In addition, the files - minutesCaloriesWide_merged.csv, - minutesIntensitiesWide_merged.csv, - minuteStepsWide_merged.csv.
    were not imported, as they presented the same data contained in other files in a wide format. Hence, only the files with long format containing the same data were imported in the BigQuery database.

    2) To be able to compare and measure the correlation among different variables based on hourly records, I decided to create a new table based on LEFT JOIN function and columns Id and ActivityHour. I repeated the same JOIN on tables with minute records. Hence I obtained 2 new tables: - hourly_activity.csv, - minute_activity.csv.

    3) To validate most of the columns containing DATE and DATETIME values that were imported as STRING data type, I used the PARSE_DATE() and PARSE_DATETIME() commands. While importing the - heartrate_seconds_merged.csv, - hourlyCalories_merged.csv, - hourlyIntensities_merged.csv, - hourlySteps_merged.csv, - minutesCaloriesNarrow_merged.csv, - minuteIntensitiesNarrow_merged.csv, - minuteMETsNarrow_merged.csv, - minuteSleep_merged.csv, - minuteSteps_merged.csv, - sleepDay_merge.csv, - weigthLog_Info_merged.csv files to BigQuery, it was necessary to import the DATETIME and DATE type columns as STRING, because the original syntax, used in the CSV files, couldn’t be recognized as a correct DATETIME data type, due to “AM” and “PM” text at the end of the expression.

    Acknowlegement

    1. Möbius' version of the data set can be found here.
    2. Furberg, Robert; Brinton, Julia; Keating, Michael ; Ortiz, Alexa https://zenodo.org/record/53894#.YMoUpnVKiP9-
  3. Intellectual Property Investigations by the USITC

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Intellectual Property Investigations by the USITC [Dataset]. https://www.kaggle.com/bigquery/usitc-investigations
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Section 337, Tariff Act of 1930, Investigations of Unfair Practices in Import Trade. Under section 337, the USITC determines whether there is unfair competition in the importation of products into, or their subsequent sale in, the United States. Section 337 prohibits the importation into the US , or the sale of such articles by owners, importers or consignees, of articles which infringe a patent, copyright, trademark, or semiconductor mask work, or where unfair competition or unfair acts exist that can destroy or substantially injure a US industry or prevent one from developing, or restrain or monopolize trade in US commerce. These latter categories are very broad: unfair competition can involve counterfeit, mismarked or misbranded goods, where the sale of the goods are at unfairly low prices, where other antitrust violations take place such as price fixing, market division or the goods violate a standard applicable to such goods.

    Content

    US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.

    Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:usitc_investigations

    "US International Trade Commission 337Info Unfair Import Investigations Information System" by the USITC, for public use.

    Banner photo by João Silas on Unsplash

  4. Data and code for: The Contributor Role Taxonomy (CRediT) at ten: a...

    • figshare.com
    pdf
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Porter; Ruth Whittam; Liz Allen; Veronique Kiermer (2025). Data and code for: The Contributor Role Taxonomy (CRediT) at ten: a retrospective analysis of the diversity ofcontributions to published research output [Dataset]. http://doi.org/10.6084/m9.figshare.28816703.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Simon Porter; Ruth Whittam; Liz Allen; Veronique Kiermer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    About this notebookThis notebook was created using a helper script: Base.ipynb. This script has some helper functions that push output directly to Datawrapper to generate the graphs that have been included in the opnion piece. To run without the helper functions and bigquery alone use!pip install google-cloud-bigquerythen add:from google.cloud.bigquery import magicsproject_id = "your_project" # update as neededmagics.context.project = project_idbq_params = {}client = bigquery.Client(project=project_id)%load_ext google.cloud.bigqueryfinally, comment out the make_chart lines.### About dimensions-ai-integrity.ds_dp_pipeline_ripeta_staging.trust_markers_rawdimensions-ai-integrity.ds_dp_pipeline_ripeta_staging.trust_markers_raw is an internal table that is the result of runing a process over the text of publications in order to identify trustmarker segments including authors contributions.The process works as follows:The process aims to automatically segment research papers into their constituent sections. It operates by identifying headings within the text based on a pre-defined set of patterns and a rule-based system. The system first cleans and normalizes the input text. It then employs regular expressions to detect potential section headings. These potential headings are validated against a set of rules that consider factors such as capitalization, the context of surrounding words, and the typical order of sections within a research paper (e.g., certain sections not appearing after "References" or before "Abstract"). Specific rules also handle exceptions for particular heading types like "Keywords" or "Appendices." Once valid headings are identified, the system extracts the corresponding textual content for each section. The output is a structured representation of the paper, categorizing text segments under their respective heading types. Any text that doesn't fall under a recognized heading is also identified as unlabeled content. The overall process aims to provide a structured understanding of the document's organization for subsequent analysis.Author Contributions segments are identified using the following regex:"author_contributions": ["((credit|descript(ion(?:s)?|ive)| )*author(s|'s|ship|s')?( |contribution(?:s)?|statement(?:s)?|role(?:s)?){2,})","contribution(?:s)"]Access to dimensions-ai-integrity.ds_dp_pipeline_ripeta_staging.trust_markers_raw is available to peer reveiwers of the opinion piece.Datasets that allow external validation of the credit ontology process identification process have also been produced.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google BigQuery (2019). BigQuery GIS Utility Datasets (U.S.) [Dataset]. https://www.kaggle.com/bigquery/utility-us
Organization logoOrganization logo

BigQuery GIS Utility Datasets (U.S.)

Useful shapefiles for GIS (BigQuery)

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Querying BigQuery tables You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME].

  • Project: "bigquery-public-data"
  • Table: "utility_us"

Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.

If you're using Python, you can start with this code:

import pandas as pd
from bq_helper import BigQueryHelper
bq_assistant = BigQueryHelper("bigquery-public-data", "utility_us")
Search
Clear search
Close search
Google apps
Main menu