28 datasets found
  1. Google Ads Transparency Center

    • console.cloud.google.com
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
    Explore at:
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Description

    This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  2. COVID-19 Cases by Country

    • console.cloud.google.com
    Updated Jul 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:European%20Centre%20for%20Disease%20Prevention%20and%20Control (2020). COVID-19 Cases by Country [Dataset]. https://console.cloud.google.com/marketplace/product/european-cdc/covid-19-global-cases
    Explore at:
    Dataset updated
    Jul 23, 2020
    Dataset provided by
    Googlehttp://google.com/
    Description

    This dataset is maintained by the European Centre for Disease Prevention and Control (ECDC) and reports on the geographic distribution of COVID-19 cases worldwide. This data includes COVID-19 reported cases and deaths broken out by country. This data can be visualized via ECDC’s Situation Dashboard . More information on ECDC’s response to COVID-19 is available here . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This dataset is hosted in both the EU and US regions of BigQuery. See the links below for the appropriate dataset copy: US region EU region This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. Users of ECDC public-use data files must comply with data use restrictions to ensure that the information will be used solely for statistical analysis or reporting purposes.

  3. Python Package Index (PyPI)

    • console.cloud.google.com
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Python%20Software%20Foundation&hl=ko (2023). Python Package Index (PyPI) [Dataset]. https://console.cloud.google.com/marketplace/product/gcp-public-data-pypi/pypi?hl=ko
    Explore at:
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides download statistics for all package downloads from the Python Package Index (PyPI). It also includes a dataset containing all the metadata for every distribution released on PyPI. The data is streamed in near-real-time from PyPI CDN, after which it is periodically loaded into the BigQuery dataset. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  4. BREATHE BioMedical Literature Dataset

    • console.cloud.google.com
    Updated Dec 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BREATHE&hl=sr (2022). BREATHE BioMedical Literature Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/breathe-gcp-public-data/breathe?hl=sr
    Explore at:
    Dataset updated
    Dec 23, 2022
    Dataset provided by
    Googlehttp://google.com/
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    BREATHE is a large-scale biomedical database containing entries from 10 major repositories of biomedical research. Our dataset contains both abstract and full body texts of biomedical papers going back for decades and contains more than 16 million unique papers. This dataset can be used to train language models to better understand outcomes from biomedical research and uncover insights to combat the COVID-19 pandemic. This dataset is also available for access in Google Cloud Storage. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  5. Weather Data for COVID-19 Research

    • console.cloud.google.com
    Updated Jul 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:WeatherSource&hl=ko (2023). Weather Data for COVID-19 Research [Dataset]. https://console.cloud.google.com/marketplace/product/gcp-public-data-weather-source/weathersource-covid19?hl=ko
    Explore at:
    Dataset updated
    Jul 27, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    Weather Source, a leading provider of weather and climate technologies for business intelligence, is offering complimentary data for those researching the potential connections between weather and COVID-19 viability and transmission. This share includes: Global historical weather data dating back to October 2019 Present data Forecast data out to 15 days The data supports temperature and humidity, both specific and relative, at the daily level. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This dataset is created and owned by Weather Source and made available for educational and academic research purposes. This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate.

  6. USAFacts US Coronavirus Database

    • kaggle.com
    zip
    Updated May 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). USAFacts US Coronavirus Database [Dataset]. https://www.kaggle.com/bigquery/covid19-usafacts
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    May 31, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Context

    To aid researchers, data scientists, and analysts in the effort to combat COVID-19, Google is making a hosted repository of public datasets including OpenStreetMap data, free to access. To facilitate the Kaggle community to access the BigQuery dataset, it is onboarded to Kaggle platform which allows querying it without a linked GCP account. Please note that due to the large size of the dataset, Kaggle applies a quota of 5 TB of data scanned per user per 30-days.

    Description

    This data from USAFacts provides US COVID-19 case and death counts by state and county. This data is sourced from the CDC, and state and local health agencies.

    For more information, see the USAFacts site on the Coronavirus. Interactive data visualizations are also available via USAFacts.

  7. GOES 16/18/19

    • console.cloud.google.com
    Updated Apr 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:NOAA&hl=ja (2025). GOES 16/18/19 [Dataset]. https://console.cloud.google.com/marketplace/product/noaa-public/goes?hl=ja
    Explore at:
    Dataset updated
    Apr 4, 2025
    Dataset provided by
    Googlehttp://google.com/
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description

    NOTICE: NEW GOES-19 Data!!! On April 4, 2025 at 1500 UTC, the GOES-19 satellite will be declared the Operational GOES-East satellite. All products and services, including NODD, for GOES-East will transition to GOES-19 data at that time. GOES-19 will operate out of the GOES-East location of 75.2°W starting on April 1, 2025 and through the operational transition. Until the transition time and during the final stretch of Post Launch Product Testing (PLPT), GOES-19 products are considered non-operational regardless of their validation maturity level. Shortly following the transition of GOES-19 to GOES-East, all data distribution from GOES-16 will be turned off. GOES-16 will drift to the storage location at 104.7°W. GOES-19 data should begin flowing again on April 4th once this maneuver is complete. NOTICE: As of January 10th 2023, GOES-18 assumed the GOES-West position and all data files are deemed both operational and provisional, so no ‘preliminary, non-operational’ caveat is needed. GOES-17 is now offline, shifted approximately 105 degree West, where it will be in on-orbit storage. GOES-17 data will no longer be available. Operational GOES-West products can be found in the GOES-18 bucket. The Geostationary Operational Environmental Satellite-R Series (GOES-R) is the next generation of geostationary weather satellites. The GOES-R series will significantly improve the detection and observation of environmental phenomena that directly affect public safety, protection of property and our nation’s economic health and prosperity. GOES satellites (GOES-16, GOES-17, GOES-18, and GOES-19) provide continuous weather imagery and monitoring of meteorological and space environment data across North America. GOES satellites provide the kind of continuous monitoring necessary for intensive data analysis. They hover continuously over one position on the surface. The satellites orbit high enough to allow for a full-disc view of the Earth. Because they stay above a fixed spot on the surface, they provide a constant vigil for the atmospheric "triggers" for severe weather conditions such as tornadoes, flash floods, hailstorms, and hurricanes. When these conditions develop, the GOES satellites are able to monitor storm development and track their movements. SUVI products available in both NetCDF and FITS. GOES data can be found in the GCS buckets: gs://gcp-public-data-goes-16 gs://gcp-public-data-goes-18 gs://gcp-public-data-goes-19 Pub/Sub topics you can subscribe to for updates: projects/gcp-public-data---goes-16/topics/gcp-public-data-goes-16 projects/gcs-public-datasets/topics/gcp-public-data-goes-18 projects/noaa-public/topics/gcp-public-data-goes-19 This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  8. Ethereum Blockchain

    • kaggle.com
    zip
    Updated Jul 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Metis A.I. (2021). Ethereum Blockchain [Dataset]. https://www.kaggle.com/buryhuang/ethereum-blockchain
    Explore at:
    zip(5945941453 bytes)Available download formats
    Dataset updated
    Jul 3, 2021
    Authors
    Metis A.I.
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    Using Ethereum public data to analyze for trading signal becomes a trend. Google BigQuery is way too costly. This forever free public dataset is created and updated for public to avoid the over charge by GCP.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  9. Dogecoin Cryptocurrency Dataset

    • console.cloud.google.com
    Updated Jan 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Dogecoin&hl=zh_ZN (2023). Dogecoin Cryptocurrency Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/dogecoin/crypto-dogecoin?hl=zh_ZN
    Explore at:
    Dataset updated
    Jan 9, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    Dogecoin is an open source peer-to-peer digital currency, favored by Shiba Inus worldwide. It is qualitatively more fun while being technically nearly identical to its close relative Bitcoin. This dataset contains the blockchain data in their entirety, pre-processed to be human-friendly and to support common use cases such as auditing, investigating, and researching the economic and financial properties of the system. This dataset is part of a larger effort to make cryptocurrency data available in BigQuery through the Google Cloud Public Datasets program . The program is hosting several cryptocurrency datasets, with plans to both expand offerings to include additional cryptocurrencies and reduce the latency of updates. You can find these datasets by searching "cryptocurrency" in GCP Marketplace. For analytics interoperability, we designed a unified schema that allows all Bitcoin-like datasets to share queries. Interested in learning more about how the data from these blockchains were brought into BigQuery? Looking for more ways to analyze the data? Check out the Google Cloud Big Data blog post and try the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  10. High Resolution Rapid Refresh Model (HRRR)

    • console.cloud.google.com
    Updated Jul 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:NOAA&hl=es-419 (2023). High Resolution Rapid Refresh Model (HRRR) [Dataset]. https://console.cloud.google.com/marketplace/product/noaa-public/hrrr?hl=es-419&jsmode
    Explore at:
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Googlehttp://google.com/
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Description

    The High-Resolution Rapid Refresh (HRRR) is a NOAA real-time 3-km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3km grids with 3km radar assimilation. Radar data is assimilated in the HRRR every 15 min over a 1-h period adding further detail to that provided by the hourly data assimilation from the 13km radar-enhanced Rapid Refresh. For more information, see the HRRR Info Page from NOAA ESRL. In addition to the real-time data that is continuously updated, archived data is now available for HRRR forecasts. This data dates back as far as 2014, and is one of the most complete publicly-available archives of HRRR data. This dataset includes a Pub/Sub topic you can subscribe to in order to be notified of updates. Subscribe to the topic 'projects/gcp-public-data-weather/topics/gcp-public-data-hrrr'. Use the Pub/Sub Quickstarts guide to learn more. This public dataset is hosted in Google Cloud Storage and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.

  11. Cloud Vulnerabilities Dataset

    • kaggle.com
    zip
    Updated Jun 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SUNNY THAKUR (2025). Cloud Vulnerabilities Dataset [Dataset]. https://www.kaggle.com/datasets/cyberprince/cloud-vulnerabilities-dataset/discussion
    Explore at:
    zip(71217 bytes)Available download formats
    Dataset updated
    Jun 19, 2025
    Authors
    SUNNY THAKUR
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Cloud Vulnerabilities Dataset (VUL0001-VUL1200)

    Overview The Cloud Vulnerabilities Dataset is a comprehensive collection of 1200 unique cloud security vulnerabilities, covering major cloud providers including AWS, Azure, Google Cloud Platform (GCP), Oracle Cloud, IBM Cloud, and Alibaba Cloud. This dataset is designed for cybersecurity professionals, penetration testers, machine learning engineers, and data scientists to analyze, train AI models, and enhance cloud security practices. Each entry details a specific vulnerability, including its description, category, cloud provider, vulnerable code (where applicable), proof of concept (PoC), and source references. The dataset emphasizes advanced and niche attack vectors such as misconfigurations, privilege escalations, data exposures, and denial-of-service (DoS) vulnerabilities, making it a valuable resource for red team exercises, security research, and AI-driven threat detection. Dataset Details

    Total Entries: 1200 Format: JSONL (JSON Lines)

    File Names: cloud_vulnerabilities_dataset_1-1200.jsonl

    Timestamp: Entries are timestamped as of June 19, 2025. ```python Categories: Access Control Data Exposure Privilege Escalation Data Exfiltration Denial of Service Code Injection Authentication Encryption Network Security Session Management Domain Hijacking Data Loss

    
    ```python
    Cloud Providers Covered:
    Amazon Web Services (AWS)
    Microsoft Azure
    Google Cloud Platform (GCP)
    Oracle Cloud
    IBM Cloud
    Alibaba Cloud
    

    Dataset Structure Each entry in the dataset is a JSON object with the following fields:

    id: Unique identifier for the vulnerability (e.g., VUL0001).
    description: Detailed description of the vulnerability.
    category: Type of vulnerability (e.g., Data Exposure, Privilege Escalation).
    cloud_provider: The cloud platform affected (e.g., AWS, Azure).
    vulnerable_code: Example of misconfigured code or settings (if applicable).
    poc: Proof of concept command or script to demonstrate the vulnerability.
    source: Reference to CVE or documentation link.
    timestamp: Date and time of the entry (ISO 8601 format, e.g., 2025-06-19T12:10:00Z).
    
    Example Entry
    {
     "id": "VUL1190",
     "description": "Alibaba Cloud ECS with misconfigured snapshot policy allowing data exposure.",
     "category": "Data Exposure",
     "cloud_provider": "Alibaba Cloud",
     "vulnerable_code": "{ \"SnapshotPolicy\": { \"publicAccess\": true } }",
     "poc": "aliyun ecs DescribeSnapshots --SnapshotId snapshot-id",
     "source": {
      "cve": "N/A",
      "link": "https://www.alibabacloud.com/help/doc-detail/25535.htm"
     },
     "timestamp": "2025-06-19T12:10:00Z"
    }
    

    Usage This dataset can be used for:

    Penetration Testing: Leverage PoC scripts to test cloud environments for vulnerabilities. AI/ML Training: Train machine learning models for anomaly detection, vulnerability classification, or automated remediation. Security Research: Analyze trends in cloud misconfigurations and attack vectors. Education: Teach cloud security best practices and vulnerability mitigation strategies.

    Prerequisites

    Tools: Familiarity with cloud CLI tools (e.g., AWS CLI, Azure CLI, gcloud, oci, ibmcloud, aliyun). Programming: Knowledge of Python, JSON parsing, or scripting for processing JSONL files. Access: Valid cloud credentials for testing PoCs in a controlled, authorized environment.

    Getting Started

    Download the Dataset: Obtain the JSONL files: cloud_vulnerabilities_dataset_1-1200.jsonl

    Parse the Dataset: Use a JSONL parser (e.g., Python’s json module) to read and process entries.

    import json
    
    with open('cloud_vulnerabilities_dataset_1-1200.jsonl', 'r') as file:
      for line in file:
        entry = json.loads(line.strip())
        print(entry['id'], entry['description'])
    
    
    

    Run PoCs:

    Execute PoC commands in a sandboxed environment to verify vulnerabilities (ensure proper authorization).
    Example: aws s3 ls s3://bucket for AWS S3 vulnerabilities.
    
    

    Analyze Data: Use data analysis tools (e.g., Pandas, Jupyter) to explore vulnerability patterns or train ML models.

    Security Considerations

    Ethical Use: Only test PoCs in environments where you have explicit permission. Data Sensitivity: Handle dataset entries with care, as they contain sensitive configuration examples. Mitigation: Refer to source links for official documentation on fixing vulnerabilities.

    Contributing Contributions to expand or refine the dataset are welcome. Please submit pull requests with:

    New vulnerability entries in JSONL format. Clear documentation of the vulnerability, PoC, and source. Ensure no duplicate IDs or entries.

    License This dataset is released under the MIT License. You are free to use, modify, and distribute it, provided the original attribution is maintained. Contact For questions, feedback, or contributions, please reach out via:

    Email: sunny48445@gmail.com

    Acknowledgments

    Inspir...

  12. Synthea Generated Synthetic Data in FHIR

    • console.cloud.google.com
    Updated Jul 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The MITRE Corporation (2023). Synthea Generated Synthetic Data in FHIR [Dataset]. https://console.cloud.google.com/marketplace/product/mitre/synthea-fhir?hl=fr
    Explore at:
    Dataset updated
    Jul 27, 2023
    Dataset authored and provided by
    The MITRE Corporationhttps://www.mitre.org/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Synthea Generated Synthetic Data in FHIR hosts over 1 million synthetic patient records generated using Synthea in FHIR format. Exported from the Google Cloud Healthcare API FHIR Store into BigQuery using analytics schema . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This public dataset is also available in Google Cloud Storage and available free to use. The URL for the GCS bucket is gs://gcp-public-data--synthea-fhir-data-1m-patients. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Please cite SyntheaTM as: Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, Scott McLachlan, Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record, Journal of the American Medical Informatics Association, Volume 25, Issue 3, March 2018, Pages 230–238, https://doi.org/10.1093/jamia/ocx079

  13. TotalSegmentator segmentations and radiomics features for NCI Imaging Data...

    • zenodo.org
    bin, csv
    Updated May 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vamsi Krishna Thiriveedhi; Deepa Krishnaswamy; David Clunie; David Clunie; Andrey Fedorov; Andrey Fedorov; Vamsi Krishna Thiriveedhi; Deepa Krishnaswamy (2024). TotalSegmentator segmentations and radiomics features for NCI Imaging Data Commons CT images [Dataset]. http://doi.org/10.5281/zenodo.8347012
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    May 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vamsi Krishna Thiriveedhi; Deepa Krishnaswamy; David Clunie; David Clunie; Andrey Fedorov; Andrey Fedorov; Vamsi Krishna Thiriveedhi; Deepa Krishnaswamy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contributes volumetric segmentations of the anatomic regions in a subset of CT images available from NCI Imaging Data Commons [1] (https://imaging.datacommons.cancer.gov/) automatically generated using the TotalSegmentation model v1.5.6 [2]. The initial release includes segmentations for the majority of the CT scans included in the National Lung Screening Trial (NLST) collection [3], [4] already available in IDC. Direct link to open this analysis result dataset in IDC (available after release of IDC v18): https://portal.imaging.datacommons.cancer.gov/explore/filters/?analysis_results_id=TotalSegmentator-CT-Segmentations.

    Specifically, for each of the CT series analyzed, we include segmentations as generated by TotalSegmentator, converted into DICOM Segmentation object format using dcmqi v1.3.0 [5], and first order and shape features for each of the segmented regions, as produced by pyradiomics v3.0.1 [6]. Radiomics features were converted to DICOM Structured Reporting documents following template TID1500 using dcmqi. TotalSegmentator analysis on the NLST cohort was executed using Terra platform [7]. Implementation of the workflow that was used for performing the analysis is available at https://github.com/ImagingDataCommons/CloudSegmentator [8].

    Due to the large size of the files, they are stored in the cloud buckets maintained by IDC, and the attached files are the manifests that can be used to download the actual files.

    The GCP and AWS manifests provided with this dataset record can be used to download the corresponding files from the IDC Google Cloud Storage (GCS) or Amazon S3 (AWS) buckets free of charge following the instructions available in IDC documentation here: https://learn.canceridc.dev/data/downloading-data. Specifically, you will need to install the s5cmd command line tool on your computer (see instructions at https://github.com/peak/s5cmd#installation), and follow the manifest-specific download instructions accompanying the file list below.

    If you use the files referenced in the attached manifests, we ask you to cite this dataset and the preprint describing how it was generated [9].

    Specific files included in the record are:

    1. totalsegmentator_ct_segmentations_aws.s5cmd.zip: compressed AWS-based manifest (to download the files described in the manifest, execute this command: s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com run totalsegmentator_ct_segmentations_aws.s5cmd)

    2. totalsegmentator_ct_segmentations_gcs.s5cmd.zip: GCS-based manifest (to download the files described in the manifest, execute this command: s5cmd --no-sign-request --endpoint-url https://storage.googleapis.com run totalsegmentator_ct_segmentations_gcs.s5cmd)

    3. Gen3-based manifest (see details in https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids).

  14. GOES-17

    • console.cloud.google.com
    Updated Apr 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:NOAA&hl=zh_ZN (2023). GOES-17 [Dataset]. https://console.cloud.google.com/marketplace/product/noaa-public/goes-17?hl=zh_ZN
    Explore at:
    Dataset updated
    Apr 26, 2023
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Googlehttp://google.com/
    Description

    GOES-17 (Geostationary Operational Environmental Satellite) is the second in the GOES-R series that promises significant upgrades in observing environmental phenomena. It provides images of weather patterns and severe storms as frequently as every 30 seconds, which supports more accurate and reliable weather forecasts and severe weather outlooks. The dataset includes a feed of the Advanced Baseline Imager (ABI) radiance data (Level 1b) and Cloud and Moisture Imager (CMI) products (Level 2). The NOAA Big Data Project (BDP) is an experimental collaboration between NOAA and infrastructure-as-a-service (IaaS) providers to explore methods of expanding the accessibility of NOAA’s data to facilitate innovation and collaboration. The goal is to help form new lines of business and facilitate economic growth while making NOAA's data more easily discoverable for the American public. This public dataset is hosted in Google Cloud Storage and available free to use. Click the "view dataset" button at the top to access the raw NetCDF files in Cloud Storage. Check out this quick start guide to learn how to access public datasets on Google Cloud Storage. This dataset includes a Pub/Sub topic you can subscribe to in order to be notified of updates. Subscribe to the topic 'projects/gcp-public-data---goes-17/topics/gcp-public-data-goes-17'. Use the Pub/Sub Quickstarts guide to learn more.

  15. gnomAD

    • console.cloud.google.com
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Broad%20Institute%20of%20MIT%20and%20Harvard&hl=zh_TW (2023). gnomAD [Dataset]. https://console.cloud.google.com/marketplace/product/broad-institute/gnomad?hl=zh_TW
    Explore at:
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    The Genome Aggregation Database (gnomAD) is maintained by an international coalition of investigators to aggregate and harmonize data from large-scale sequencing projects. These public datasets are available in VCF format in Google Cloud Storage and in Google BigQuery as integer range partitioned tables . Each dataset is sharded by chromosome meaning variants are distributed across 24 tables (indicated with “_chr*” suffix). Utilizing the sharded tables reduces query costs significantly. Variant Transforms was used to process these VCF files and import them to BigQuery. VEP annotations were parsed into separate columns for easier analysis using Variant Transforms’ annotation support . These public datasets are included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Find out more in our blog post, Providing open access to gnomAD on Google Cloud . Questions? Contact gcp-life-sciences-discuss@googlegroups.com.

  16. Data from: Bitcoin Cryptocurrency

    • console.cloud.google.com
    Updated Mar 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Bitcoin&hl=fr_FR (2023). Bitcoin Cryptocurrency [Dataset]. https://console.cloud.google.com/marketplace/product/bitcoin/crypto-bitcoin?hl=fr_FR
    Explore at:
    Dataset updated
    Mar 26, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    Bitcoin is a crypto currency leveraging blockchain technology to store transactions in a distributed ledger. A blockchain is an ever-growing tree of blocks. Each block contains a number of transactions. To learn more, read the Bitcoin Wiki . This dataset is part of a larger effort to make cryptocurrency data available in BigQuery through the Google Cloud Public Datasets program. The program is hosting several cryptocurrency datasets, with plans to both expand offerings to include additional cryptocurrencies and reduce the latency of updates. You can find these datasets by searching "cryptocurrency" in GCP Marketplace. For analytics interoperability, we designed a unified schema that allows all Bitcoin-like datasets to share queries. To further interoperate with Ethereum and ERC-20 token transactions, we also created some views that abstract the blockchain ledger to be presented as a double-entry accounting ledger. Interested in learning more about how the data from these blockchains were brought into BigQuery? Looking for more ways to analyze the data? Check out our blog post on the Google Cloud Big Data Blog and try the sample query below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  17. Zip Code Tabulation Area (ZCTA)

    • console.cloud.google.com
    Updated Oct 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&hl=zh-TW (2023). Zip Code Tabulation Area (ZCTA) [Dataset]. https://console.cloud.google.com/marketplace/product/united-states-census-bureau/zcta?hl=zh-TW&jsmode
    Explore at:
    Dataset updated
    Oct 21, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    These are the full-resolution boundary zip code tabular areas (ZCTA), derived from the US Census Bureau's TIGER/Line Shapefiles. The dataset contains polygons that roughly approximate each of the USPS 5-digit zip codes. It is one of many geography datasets available in BigQuery through the Google Cloud Public Dataset Program to support geospatial analysis. You can find more information on the other datasets at the US Geographic Boundaries Marketplace page . Though they do not continuously cover all land and water areas in the United States, ZCTAs are a great way to visualize geospatial data in an understandable format with excellent spatial resolution. This dataset gives the area of land and water within each zip code, as well as the corresponding city and state for each zip code. This makes the dataset an excellent candidate for JOINs to support geospatial queries with BigQuery’s GIS capabilities. Note: BQ-GIS is in public beta, so your GCP project will need to be whitelisted to try out these queries. You can sign up to request access here . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  18. Bitcoin Cash Cryptocurrency Dataset

    • console.cloud.google.com
    Updated Apr 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Bitcoin%20Cash&hl=es (2023). Bitcoin Cash Cryptocurrency Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/bitcoin-cash/crypto-bitcoin-cash?hl=es
    Explore at:
    Dataset updated
    Apr 23, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    Bitcoin Cash is a cryptocurrency that allows more bytes to be included in each block relative to it’s common ancestor Bitcoin. This dataset contains the blockchain data in their entirety, pre-processed to be human-friendly and to support common use cases such as auditing, investigating, and researching the economic and financial properties of the system. This dataset is part of a larger effort to make cryptocurrency data available in BigQuery through the Google Cloud Public Datasets program . The program is hosting several cryptocurrency datasets, with plans to both expand offerings to include additional cryptocurrencies and reduce the latency of updates. You can find these datasets by searching "cryptocurrency" in GCP Marketplace. For analytics interoperability, we designed a unified schema that allows all Bitcoin-like datasets to share queries. Interested in learning more about how the data from these blockchains were brought into BigQuery? Looking for more ways to analyze the data? Check out the Google Cloud Big Data blog post and try the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  19. ISB-CGC Cancer Gateway in the Cloud

    • console.cloud.google.com
    Updated Jul 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:ISB%20Cancer%20Gateway&hl=en_GB (2023). ISB-CGC Cancer Gateway in the Cloud [Dataset]. https://console.cloud.google.com/marketplace/product/gcp-public-data-isb-cgc/isb-cgc-cancer-data?hl=en_GB
    Explore at:
    Dataset updated
    Jul 21, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    The ISB Cancer Gateway in the Cloud (ISB-CGC) is one of three National Cancer Institute (NCI) Cloud Resources tasked with bringing cancer data and computation power together through cloud platforms. It is a collaboration between the Institute for Systems Biology (ISB) and General Dynamics Information Technology Inc. (GDIT). Since starting in 2014 as part of NCI’s Cloud Pilot Resource initiative, ISB-CGC has provided access to increasing amounts of cancer data in the cloud. In Google BigQuery, ISB-CGC stores high-level clinical, biospecimen, genomic and proteomic cancer research data obtained from the NCI Genomic Data Commons (GDC) and Proteomics Data Commons (PDC). It also stores a large amount of metadata about files that are stored in the GDC Google Cloud Storage, as well as genome reference sources (e.g. GENCODE, miRBase, etc.). The majority of these datasets and tables are completely open access and available to the research community. ISB-CGC has consolidated the data by research program and data type (ex. Clinical, DNA Methylation, RNAseq, Somatic Mutation, etc.) and transformed it into ISB-CGC Google BigQuery tables for ease of access and analysis. This novel approach allows users to quickly analyze information from thousands of patients. The ISB-CGC BigQuery Table Search UI is a discovery tool that allows users to explore and search for ISB-CGC hosted BigQuery tables. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  20. OnPoint Weather - Past Weather and Climatology Data Sample

    • console.cloud.google.com
    Updated May 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Weather%20Source&hl=zh-tw (2023). OnPoint Weather - Past Weather and Climatology Data Sample [Dataset]. https://console.cloud.google.com/marketplace/product/weathersource-com/weather-past-climatology?hl=zh-tw
    Explore at:
    Dataset updated
    May 13, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    OnPoint Weather is a global weather dataset for business available for any lat/lon point and geographic area such as ZIP codes. OnPoint Weather provides a continuum of hourly and daily weather from the year 2000 to current time and a forward forecast of 45 days. OnPoint Climatology provides hourly and daily weather statistics which can be used to determine ‘departures from normal’ and to provide climatological guidance of expected weather for any location at any point in time. The OnPoint Climatology provides weather statistics such as means, standard deviations and frequency of occurrence. Weather has a significant impact on businesses and accounts for hundreds of billions in lost revenue annually. OnPoint Weather allows businesses to quantify weather impacts and develop strategies to optimize for weather to improve business performance. Examples of Usage Quantify the impact of weather on sales across diverse locations and times of the year Understand how supply chains are impacted by weather Understand how employee’s attendance and performance are impacted by weather Understand how weather influences foot traffic at malls, stores and restaurants OnPoint Weather is available through Google Cloud Platform’s Commercial Dataset Program and can be easily integrated with other Google Cloud Platform Services to quickly reveal and quantify weather impacts on business. Weather Source provides a full range of support services from answering quick questions to consulting and building custom solutions. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery 瞭解詳情

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
Organization logoOrganization logo

Google Ads Transparency Center

Explore at:
14 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 6, 2023
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Description

This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Search
Clear search
Close search
Google apps
Main menu