4 datasets found

d
Post-Processing National Water Model Long-Range Forecasts with Random Forest...
search.dataone.org
hydroshare.org
Updated Dec 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Anderson (2024). Post-Processing National Water Model Long-Range Forecasts with Random Forest Regression in the Cloud to Improve Forecast Accuracy for Decision-Makers and Water Managers - Script/Data [Dataset]. https://search.dataone.org/view/sha256%3A50abc8f187746159df8ac98d1a6eda224082e6ee902ab18f6d55f7d151291447
Explore at:
Dataset updated
Dec 14, 2024
Dataset provided by
Hydroshare
Authors
Jacob Anderson
Description
This resource contains the Python script run within the Google Cloud Console to bias correct the NWM long-range forecasts.
Intellectual Property Investigations by the USITC
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Intellectual Property Investigations by the USITC [Dataset]. https://www.kaggle.com/bigquery/usitc-investigations
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

Section 337, Tariff Act of 1930, Investigations of Unfair Practices in Import Trade. Under section 337, the USITC determines whether there is unfair competition in the importation of products into, or their subsequent sale in, the United States. Section 337 prohibits the importation into the US , or the sale of such articles by owners, importers or consignees, of articles which infringe a patent, copyright, trademark, or semiconductor mask work, or where unfair competition or unfair acts exist that can destroy or substantially injure a US industry or prevent one from developing, or restrain or monopolize trade in US commerce. These latter categories are very broad: unfair competition can involve counterfeit, mismarked or misbranded goods, where the sale of the goods are at unfairly low prices, where other antitrust violations take place such as price fixing, market division or the goods violate a standard applicable to such goods.

Content

US International Trade Commission 337Info Unfair Import Investigations Information System contains data on investigations done under Section 337. Section 337 declares the infringement of certain statutory intellectual property rights and other forms of unfair competition in import trade to be unlawful practices. Most Section 337 investigations involve allegations of patent or registered trademark infringement.

Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

Acknowledgements

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:usitc_investigations

"US International Trade Commission 337Info Unfair Import Investigations Information System" by the USITC, for public use.

Banner photo by João Silas on Unsplash
FitBit Fitness Tracker Data (revised)
kaggle.com
zip
Updated Dec 17, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
duart2688 (2022). FitBit Fitness Tracker Data (revised) [Dataset]. https://www.kaggle.com/duart2688/fitabase-data-cleaned-using-sql
Explore at:
zip(12763010 bytes)Available download formats
Dataset updated
Dec 17, 2022
Authors
duart2688
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Content

This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.

Main modifications

This is the list of manipulations performed on the original dataset, published by Möbius. All the cleaning process and rearrangements were performed in BigQuery, using SQL functions. 1) After I took a closer look at the source dataset, I realized that for my case study, I did not need some of the tables contained in the original archive. Therefore, I decided not to import - dailyCalories_merged.csv, - dailyIntensities_merged.csv, - dailySteps_merged.csv. as they proved redundant, their content could be found in the dailyActivity_merged.csv file. In addition, the files - minutesCaloriesWide_merged.csv, - minutesIntensitiesWide_merged.csv, - minuteStepsWide_merged.csv.
were not imported, as they presented the same data contained in other files in a wide format. Hence, only the files with long format containing the same data were imported in the BigQuery database.

2) To be able to compare and measure the correlation among different variables based on hourly records, I decided to create a new table based on LEFT JOIN function and columns Id and ActivityHour. I repeated the same JOIN on tables with minute records. Hence I obtained 2 new tables: - hourly_activity.csv, - minute_activity.csv.

3) To validate most of the columns containing DATE and DATETIME values that were imported as STRING data type, I used the PARSE_DATE() and PARSE_DATETIME() commands. While importing the - heartrate_seconds_merged.csv, - hourlyCalories_merged.csv, - hourlyIntensities_merged.csv, - hourlySteps_merged.csv, - minutesCaloriesNarrow_merged.csv, - minuteIntensitiesNarrow_merged.csv, - minuteMETsNarrow_merged.csv, - minuteSleep_merged.csv, - minuteSteps_merged.csv, - sleepDay_merge.csv, - weigthLog_Info_merged.csv files to BigQuery, it was necessary to import the DATETIME and DATE type columns as STRING, because the original syntax, used in the CSV files, couldn’t be recognized as a correct DATETIME data type, due to “AM” and “PM” text at the end of the expression.

Acknowlegement

Möbius' version of the data set can be found here.

Furberg, Robert; Brinton, Julia; Keating, Michael ; Ortiz, Alexa https://zenodo.org/record/53894#.YMoUpnVKiP9-
gnomAD
console.cloud.google.com
Updated Jul 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Broad%20Institute%20of%20MIT%20and%20Harvard&hl=zh_TW (2023). gnomAD [Dataset]. https://console.cloud.google.com/marketplace/product/broad-institute/gnomad?hl=zh_TW
Explore at:
Dataset updated
Jul 25, 2023
Dataset provided by
Googlehttp://google.com/
Description
The Genome Aggregation Database (gnomAD) is maintained by an international coalition of investigators to aggregate and harmonize data from large-scale sequencing projects. These public datasets are available in VCF format in Google Cloud Storage and in Google BigQuery as integer range partitioned tables . Each dataset is sharded by chromosome meaning variants are distributed across 24 tables (indicated with “_chr*” suffix). Utilizing the sharded tables reduces query costs significantly. Variant Transforms was used to process these VCF files and import them to BigQuery. VEP annotations were parsed into separate columns for easier analysis using Variant Transforms’ annotation support . These public datasets are included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Find out more in our blog post, Providing open access to gnomAD on Google Cloud . Questions? Contact gcp-life-sciences-discuss@googlegroups.com.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jacob Anderson (2024). Post-Processing National Water Model Long-Range Forecasts with Random Forest Regression in the Cloud to Improve Forecast Accuracy for Decision-Makers and Water Managers - Script/Data [Dataset]. https://search.dataone.org/view/sha256%3A50abc8f187746159df8ac98d1a6eda224082e6ee902ab18f6d55f7d151291447

Post-Processing National Water Model Long-Range Forecasts with Random Forest Regression in the Cloud to Improve Forecast Accuracy for Decision-Makers and Water Managers - Script/Data

Explore at:

Dataset updated

Dec 14, 2024

Dataset provided by

Hydroshare

Authors

Jacob Anderson

Description

This resource contains the Python script run within the Google Cloud Console to bias correct the NWM long-range forecasts.

Clear search

Close search

Google apps

Main menu

Post-Processing National Water Model Long-Range Forecasts with Random Forest...

Intellectual Property Investigations by the USITC

Context

Content

Acknowledgements

FitBit Fitness Tracker Data (revised)

Content

Main modifications

Acknowlegement

gnomAD

Post-Processing National Water Model Long-Range Forecasts with Random Forest Regression in the Cloud to Improve Forecast Accuracy for Decision-Makers and Water Managers - Script/Data