100+ datasets found
  1. Looker Ecommerce BigQuery Dataset

    • kaggle.com
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mustafa Keser
    Description

    Looker Ecommerce Dataset Description

    CSV version of Looker Ecommerce Dataset.

    Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

    1. distribution_centers.csv

    • Columns:
      • id: Unique identifier for each distribution center.
      • name: Name of the distribution center.
      • latitude: Latitude coordinate of the distribution center.
      • longitude: Longitude coordinate of the distribution center.

    2. events.csv

    • Columns:
      • id: Unique identifier for each event.
      • user_id: Identifier for the user associated with the event.
      • sequence_number: Sequence number of the event.
      • session_id: Identifier for the session during which the event occurred.
      • created_at: Timestamp indicating when the event took place.
      • ip_address: IP address from which the event originated.
      • city: City where the event occurred.
      • state: State where the event occurred.
      • postal_code: Postal code of the event location.
      • browser: Web browser used during the event.
      • traffic_source: Source of the traffic leading to the event.
      • uri: Uniform Resource Identifier associated with the event.
      • event_type: Type of event recorded.

    3. inventory_items.csv

    • Columns:
      • id: Unique identifier for each inventory item.
      • product_id: Identifier for the associated product.
      • created_at: Timestamp indicating when the inventory item was created.
      • sold_at: Timestamp indicating when the item was sold.
      • cost: Cost of the inventory item.
      • product_category: Category of the associated product.
      • product_name: Name of the associated product.
      • product_brand: Brand of the associated product.
      • product_retail_price: Retail price of the associated product.
      • product_department: Department to which the product belongs.
      • product_sku: Stock Keeping Unit (SKU) of the product.
      • product_distribution_center_id: Identifier for the distribution center associated with the product.

    4. order_items.csv

    • Columns:
      • id: Unique identifier for each order item.
      • order_id: Identifier for the associated order.
      • user_id: Identifier for the user who placed the order.
      • product_id: Identifier for the associated product.
      • inventory_item_id: Identifier for the associated inventory item.
      • status: Status of the order item.
      • created_at: Timestamp indicating when the order item was created.
      • shipped_at: Timestamp indicating when the order item was shipped.
      • delivered_at: Timestamp indicating when the order item was delivered.
      • returned_at: Timestamp indicating when the order item was returned.

    5. orders.csv

    • Columns:
      • order_id: Unique identifier for each order.
      • user_id: Identifier for the user who placed the order.
      • status: Status of the order.
      • gender: Gender information of the user.
      • created_at: Timestamp indicating when the order was created.
      • returned_at: Timestamp indicating when the order was returned.
      • shipped_at: Timestamp indicating when the order was shipped.
      • delivered_at: Timestamp indicating when the order was delivered.
      • num_of_item: Number of items in the order.

    6. products.csv

    • Columns:
      • id: Unique identifier for each product.
      • cost: Cost of the product.
      • category: Category to which the product belongs.
      • name: Name of the product.
      • brand: Brand of the product.
      • retail_price: Retail price of the product.
      • department: Department to which the product belongs.
      • sku: Stock Keeping Unit (SKU) of the product.
      • distribution_center_id: Identifier for the distribution center associated with the product.

    7. users.csv

    • Columns:
      • id: Unique identifier for each user.
      • first_name: First name of the user.
      • last_name: Last name of the user.
      • email: Email address of the user.
      • age: Age of the user.
      • gender: Gender of the user.
      • state: State where t...
  2. Google Patents Public Data

    • kaggle.com
    zip
    Updated Sep 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/datasets/bigquery/patents
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2018
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

    Content

    The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

    For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

    “Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

    Banner photo by Helloquence on Unsplash

  3. AlphaFold Protein Structure Database

    • console.cloud.google.com
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=en-GB (2023). AlphaFold Protein Structure Database [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/deepmind-alphafold?hl=en-GB
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    License
    Description

    The AlphaFold Protein Structure Database is a collection of protein structure predictions made using the machine learning model AlphaFold. AlphaFold was developed by DeepMind , and this database was created in partnership with EMBL-EBI . For information on how to interpret, download and query the data, as well as on which proteins are included / excluded, and change log, please see our main dataset guide and FAQs . To interactively view individual entries or to download proteomes / Swiss-Prot please visit https://alphafold.ebi.ac.uk/ . The current release aims to cover most of the over 200M sequences in UniProt (a commonly used reference set of annotated proteins). The files provided for each entry include the structure plus two model confidence metrics (pLDDT and PAE). The files can be found in the Google Cloud Storage bucket gs://public-datasets-deepmind-alphafold-v4 with metadata in the BigQuery table bigquery-public-data.deepmind_alphafold.metadata . If you use this data, please cite: Jumper, J et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021) Varadi, M et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research (2021) This public dataset is hosted in Google Cloud Storage and is available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.

  4. h

    Data from: bigquery

    • huggingface.co
    Updated Aug 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dereje Hinsermu (2024). bigquery [Dataset]. https://huggingface.co/datasets/derekiya/bigquery
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2024
    Authors
    Dereje Hinsermu
    Description

    derekiya/bigquery dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. MultiversX Blockchain

    • console.cloud.google.com
    Updated Jan 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data (2024). MultiversX Blockchain [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/blockchain-analytics-multiversx-mainnet-eu
    Explore at:
    Dataset updated
    Jan 10, 2024
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Description

    MultiversX is a highly scalable, secure and decentralized blockchain network created to enable radically new applications, for users, businesses, society, and the new metaverse frontier. This dataset is one of many crypto datasets that are available within Google Cloud Public Datasets . As with other Google Cloud public datasets, you can query this dataset for free, up to 1TB/month of free processing, every month. Watch this short video to learn how to get started with the public datasets.

  6. Stack Overflow Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stack Overflow (2019). Stack Overflow Data [Dataset]. https://www.kaggle.com/datasets/stackoverflow/stackoverflow
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    Stack Overflowhttp://stackoverflow.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Context

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.

    Content

    Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    Dataset Source: https://archive.org/download/stackexchange

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow

    https://cloud.google.com/bigquery/public-data/stackoverflow

    Banner Photo by Caspar Rubin from Unplash.

    Inspiration

    What is the percentage of questions that have been answered over the years?

    What is the reputation and badge count of users across different tenures on StackOverflow?

    What are 10 of the “easier” gold badges to earn?

    Which day of the week has most questions answered within an hour?

  7. Google Ads Transparency Center

    • console.cloud.google.com
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
    Explore at:
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Description

    This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  8. noaa-global-forecast-system

    • console.cloud.google.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data, noaa-global-forecast-system [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/noaa-global-forecast-system
    Explore at:
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Description

    The Global Forecast System (GFS) is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). The GFS dataset consists of selected model outputs (described below) as gridded forecast variables. The 384-hour forecasts, with 3-hour forecast interval, are made at 6-hour temporal resolution (i.e. updated four times daily). Use the 'creation_time' and 'forecast_time' properties to select data of interest. The GFS is a coupled model, composed of an atmosphere model, an ocean model, a land/soil model, and a sea ice model which work together to provide an accurate picture of weather conditions. See history of recent modifications to the global forecast/analysis system , the model performance statistical web page , and the documentation homepage for more information.Learn more

  9. C

    Cloud Data Warehouse Solutions Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Aug 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Cloud Data Warehouse Solutions Report [Dataset]. https://www.datainsightsmarket.com/reports/cloud-data-warehouse-solutions-1385894
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Aug 15, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Cloud Data Warehouse (CDW) solutions market is experiencing robust growth, driven by the increasing need for scalable, cost-effective, and secure data storage and analytics solutions across various industries. The market's expansion is fueled by several factors, including the proliferation of big data, the rise of cloud computing adoption, and the growing demand for real-time business intelligence. Organizations are migrating from on-premise data warehouses to cloud-based solutions to leverage the benefits of scalability, elasticity, and pay-as-you-go pricing models. This shift is further accelerated by the increasing complexity of data management and the need for advanced analytics capabilities to gain actionable insights from vast datasets. Competition is fierce, with major players like Amazon Redshift, Snowflake, Google Cloud, and Microsoft Azure Synapse leading the market, each offering unique strengths and capabilities. However, the market also witnesses the emergence of niche players catering to specific industry needs or geographical regions. The overall market is segmented based on deployment models (public, private, hybrid), service models (SaaS, PaaS, IaaS), and industry verticals (finance, healthcare, retail, etc.). Future growth will likely be influenced by advancements in technologies such as AI, machine learning, and serverless computing, further enhancing the analytical capabilities of CDW solutions. The projected Compound Annual Growth Rate (CAGR) suggests a substantial increase in market value over the forecast period (2025-2033). Assuming a conservative CAGR of 15% (a reasonable estimate considering the rapid technological advancements in this space), and a 2025 market size of $50 billion (a reasonable estimate based on industry reports), the market is poised for significant expansion. This growth will be influenced by factors such as increasing data volumes, advancements in data analytics techniques, and the growing adoption of cloud-based technologies by small and medium-sized businesses (SMBs). Despite the rapid growth, challenges remain, including data security concerns, integration complexities, and vendor lock-in. However, continuous innovation and the development of robust security measures will mitigate these challenges, paving the way for sustained market growth in the coming years.

  10. c

    ckanext-datastore-bigquery - Extensions - CKAN Ecosystem Catalog Beta

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-datastore-bigquery - Extensions - CKAN Ecosystem Catalog Beta [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-datastore-bigquery
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The datastore-bigquery extension for CKAN allows users to leverage Google Cloud BigQuery for datastore search and SQL queries, providing an alternative to CKAN's standard datastore. By integrating with BigQuery, this extension aims to enhance performance and scalability for data-intensive operations against data stored as BigQuery tables. This plugin allows CKAN to query data that actually resides in Google BigQuery. Key Features: BigQuery Integration: Enables CKAN's datastore search and datastore SQL API to query data directly from Google BigQuery tables. Alternative to Standard Datastore: Offers BigQuery as a backend option, providing users with flexibility in choosing their data storage and query engine. Credential-Based Authentication: Relies on Google Cloud credentials (JSON file) for secure authentication and authorization to BigQuery resources. Test Suite Comes with a test suite that can be can be run as a standalone instance via pytest or also run as an integrated CKAN plugin via nosetests. Technical Integration: The extension integrates into CKAN as a plugin. You will need to enable it in the .ini configuration file. The extension uses Google Cloud credentials to authenticate and authorize access to BigQuery, enabling seamless data access and querying within the CKAN environment. Benefits & Impact: This extension is valuable for CKAN deployments dealing with big datasets hosted in BigQuery, offering potentially significant performance and scalability benefits compared to CKAN's default datastore implementation. The ability to use BigQuery as the data backend removes dependency / limitations on the CKAN datastore.

  11. Google Trends

    • console.cloud.google.com
    Updated Jun 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ES (2022). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=ES
    Explore at:
    Dataset updated
    Jun 11, 2022
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Google Searchhttp://google.com/
    Googlehttp://google.com/
    Description

    The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  12. COVID-19 Data Repository by CSSE at JHU

    • console.cloud.google.com
    Updated Mar 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Johns%20Hopkins%20University&hl=it (2023). COVID-19 Data Repository by CSSE at JHU [Dataset]. https://console.cloud.google.com/marketplace/product/johnshopkins/covid19_jhu_global_case?hl=it
    Explore at:
    Dataset updated
    Mar 26, 2023
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). The data include the location and number of confirmed COVID-19 cases, deaths, and recoveries for all affected countries, aggregated at the appropriate province/state. It was developed to enable researchers, public health authorities and the general public to track the outbreak. Additional information is available in the blog post, Mapping 2019-nCoV , and included data sources are listed here . For publications that use the data, please cite the following publication Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate.

  13. USPTO Cancer Moonshot Patent Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). USPTO Cancer Moonshot Patent Data [Dataset]. https://www.kaggle.com/datasets/bigquery/uspto-oce-cancer
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    This curated dataset consists of 269,353 patent documents (published patent applications and granted patents) spanning the 1976 to 2016 period and is intended to help identify promising R&D on the horizon in diagnostics, therapeutics, data analytics, and model biological systems.

    Content

    USPTO Cancer Moonshot Patent Data was generated using USPTO examiner tools to execute a series of queries designed to identify cancer-specific patents and patent applications. This includes drugs, diagnostics, cell lines, mouse models, radiation-based devices, surgical devices, image analytics, data analytics, and genomic-based inventions.

    Acknowledgements

    “USPTO Cancer Moonshot Patent Data” by the USPTO, for public use. Frumkin, Jesse and Myers, Amanda F., Cancer Moonshot Patent Data (August, 2016).

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_cancer

    Banner photo by Jaron Nix on Unsplash

  14. C

    Cloud Data Warehouse Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Cloud Data Warehouse Report [Dataset]. https://www.datainsightsmarket.com/reports/cloud-data-warehouse-1958553
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The cloud data warehouse market is experiencing robust growth, driven by the increasing need for scalable, cost-effective, and readily accessible data analytics solutions. The market's expansion is fueled by several key factors, including the burgeoning adoption of cloud computing across various industries, the proliferation of big data, and the growing demand for real-time business intelligence. Organizations are migrating from on-premise data warehouses to cloud-based solutions to leverage enhanced scalability, reduced infrastructure costs, and improved agility. This shift is further accelerated by the availability of advanced analytics tools and services within the cloud ecosystem, enabling businesses to derive actionable insights from their data more efficiently. Competitive pressures and the need to gain a competitive edge are also significant drivers, pushing enterprises to adopt sophisticated data warehousing solutions capable of handling complex analytical workloads. The market is highly fragmented, with major players such as Amazon, Google, Microsoft, and others competing intensely through innovation, strategic partnerships, and aggressive pricing strategies. While the market shows significant promise, certain challenges persist. Data security and privacy concerns remain a major obstacle to wider adoption, particularly in regulated industries. Integration complexities with existing on-premise systems and the need for skilled professionals to manage and maintain cloud data warehouses also present hurdles. However, ongoing technological advancements in areas such as data encryption, access control, and automated data integration are mitigating these challenges. Furthermore, the emergence of new technologies, such as serverless architectures and AI-powered analytics, is continuously reshaping the market landscape, fostering innovation and expanding the market's potential. Over the forecast period (2025-2033), consistent growth is anticipated, fueled by ongoing digital transformation initiatives across various sectors. We estimate a conservative CAGR (considering industry averages for similar tech sectors) of 15% over this period, indicating substantial growth opportunities.

  15. SEC Public Dataset

    • console.cloud.google.com
    Updated Jul 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=en_GB (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=en_GB
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Learn more

  16. SEC Public Dataset

    • console.cloud.google.com
    Updated May 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=zh-cn (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=zh-cn
    Explore at:
    Dataset updated
    May 14, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.了解详情

  17. bigquery-public-data.noaa_significant_earthquakes

    • kaggle.com
    zip
    Updated Nov 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Debrup Mukherjee (2023). bigquery-public-data.noaa_significant_earthquakes [Dataset]. https://www.kaggle.com/datasets/mukherjeedebrup/bigquery-public-data-noaa-significant-earthquakes
    Explore at:
    zip(14393 bytes)Available download formats
    Dataset updated
    Nov 5, 2023
    Authors
    Debrup Mukherjee
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Debrup Mukherjee

    Released under CC0: Public Domain

    Contents

  18. B

    Big Data Processing And Distribution Systems Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Big Data Processing And Distribution Systems Report [Dataset]. https://www.datainsightsmarket.com/reports/big-data-processing-and-distribution-systems-528339
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Big Data Processing and Distribution Systems market is experiencing robust growth, driven by the exponential increase in data volume across various industries. The market, estimated at $50 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $150 billion by 2033. This expansion is fueled by several key factors. The rising adoption of cloud-based solutions, offering scalability and cost-effectiveness, is a significant driver. Furthermore, the increasing demand for real-time analytics and advanced data processing capabilities across sectors like finance, healthcare, and e-commerce are propelling market growth. The emergence of new technologies such as edge computing and AI-powered analytics is further accelerating the adoption of sophisticated big data processing solutions. However, market growth is not without its challenges. Data security and privacy concerns, coupled with the complexity of implementing and managing big data systems, remain significant restraints. The need for specialized skills and expertise in data science and engineering also contributes to the overall cost and complexity of adoption. Despite these challenges, the market's continued expansion is anticipated, driven by the persistent need for efficient and insightful data management in an increasingly data-driven world. Segmentation within the market is diverse, encompassing various solutions including cloud-based platforms, on-premise systems, and specialized tools for data integration, processing, and visualization. Leading players such as Google, AWS, Microsoft, Snowflake, and Databricks are fiercely competing to capture market share, further stimulating innovation and driving market expansion.

  19. T

    Iowa Liquor Sales

    • arjunrana.com
    • data.iowa.gov
    • +4more
    Updated Nov 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iowa Department of Revenue, Alcoholic Beverages (2025). Iowa Liquor Sales [Dataset]. https://arjunrana.com/projects/bigquery_ML/
    Explore at:
    csv, xml, kmz, application/geo+json, kml, xlsxAvailable download formats
    Dataset updated
    Nov 1, 2025
    Dataset authored and provided by
    Iowa Department of Revenue, Alcoholic Beverages
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the spirits purchase information of Iowa Class “E” liquor licensees by product and date of purchase from January 1, 2012 to current. The dataset can be used to analyze total spirits sales in Iowa of individual products at the store level.

    Class E liquor license, for grocery stores, liquor stores, convenience stores, etc., allows commercial establishments to sell liquor for off-premises consumption in original unopened containers.

  20. h

    github-r-repos

    • huggingface.co
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Falbel (2023). github-r-repos [Dataset]. https://huggingface.co/datasets/dfalbel/github-r-repos
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2023
    Authors
    Daniel Falbel
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    GitHub R repositories dataset

    R source files from GitHub. This dataset has been created using the public GitHub datasets from Google BigQuery. This is the actual query that has been used to export the data: EXPORT DATA OPTIONS ( uri = 'gs://your-bucket/gh-r/*.parquet', format = 'PARQUET') as ( select f.id, f.repo_name, f.path, c.content, c.size from ( SELECT distinct id, repo_name, path FROM bigquery-public-data.github_repos.files where ends_with(path… See the full description on the dataset page: https://huggingface.co/datasets/dfalbel/github-r-repos.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
Organization logo

Looker Ecommerce BigQuery Dataset

CSV version of BigQuery Looker Ecommerce Dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mustafa Keser
Description

Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. distribution_centers.csv

  • Columns:
    • id: Unique identifier for each distribution center.
    • name: Name of the distribution center.
    • latitude: Latitude coordinate of the distribution center.
    • longitude: Longitude coordinate of the distribution center.

2. events.csv

  • Columns:
    • id: Unique identifier for each event.
    • user_id: Identifier for the user associated with the event.
    • sequence_number: Sequence number of the event.
    • session_id: Identifier for the session during which the event occurred.
    • created_at: Timestamp indicating when the event took place.
    • ip_address: IP address from which the event originated.
    • city: City where the event occurred.
    • state: State where the event occurred.
    • postal_code: Postal code of the event location.
    • browser: Web browser used during the event.
    • traffic_source: Source of the traffic leading to the event.
    • uri: Uniform Resource Identifier associated with the event.
    • event_type: Type of event recorded.

3. inventory_items.csv

  • Columns:
    • id: Unique identifier for each inventory item.
    • product_id: Identifier for the associated product.
    • created_at: Timestamp indicating when the inventory item was created.
    • sold_at: Timestamp indicating when the item was sold.
    • cost: Cost of the inventory item.
    • product_category: Category of the associated product.
    • product_name: Name of the associated product.
    • product_brand: Brand of the associated product.
    • product_retail_price: Retail price of the associated product.
    • product_department: Department to which the product belongs.
    • product_sku: Stock Keeping Unit (SKU) of the product.
    • product_distribution_center_id: Identifier for the distribution center associated with the product.

4. order_items.csv

  • Columns:
    • id: Unique identifier for each order item.
    • order_id: Identifier for the associated order.
    • user_id: Identifier for the user who placed the order.
    • product_id: Identifier for the associated product.
    • inventory_item_id: Identifier for the associated inventory item.
    • status: Status of the order item.
    • created_at: Timestamp indicating when the order item was created.
    • shipped_at: Timestamp indicating when the order item was shipped.
    • delivered_at: Timestamp indicating when the order item was delivered.
    • returned_at: Timestamp indicating when the order item was returned.

5. orders.csv

  • Columns:
    • order_id: Unique identifier for each order.
    • user_id: Identifier for the user who placed the order.
    • status: Status of the order.
    • gender: Gender information of the user.
    • created_at: Timestamp indicating when the order was created.
    • returned_at: Timestamp indicating when the order was returned.
    • shipped_at: Timestamp indicating when the order was shipped.
    • delivered_at: Timestamp indicating when the order was delivered.
    • num_of_item: Number of items in the order.

6. products.csv

  • Columns:
    • id: Unique identifier for each product.
    • cost: Cost of the product.
    • category: Category to which the product belongs.
    • name: Name of the product.
    • brand: Brand of the product.
    • retail_price: Retail price of the product.
    • department: Department to which the product belongs.
    • sku: Stock Keeping Unit (SKU) of the product.
    • distribution_center_id: Identifier for the distribution center associated with the product.

7. users.csv

  • Columns:
    • id: Unique identifier for each user.
    • first_name: First name of the user.
    • last_name: Last name of the user.
    • email: Email address of the user.
    • age: Age of the user.
    • gender: Gender of the user.
    • state: State where t...
Search
Clear search
Close search
Google apps
Main menu