100+ datasets found
  1. Looker Ecommerce BigQuery Dataset

    • kaggle.com
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mustafa Keser
    Description

    Looker Ecommerce Dataset Description

    CSV version of Looker Ecommerce Dataset.

    Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

    1. distribution_centers.csv

    • Columns:
      • id: Unique identifier for each distribution center.
      • name: Name of the distribution center.
      • latitude: Latitude coordinate of the distribution center.
      • longitude: Longitude coordinate of the distribution center.

    2. events.csv

    • Columns:
      • id: Unique identifier for each event.
      • user_id: Identifier for the user associated with the event.
      • sequence_number: Sequence number of the event.
      • session_id: Identifier for the session during which the event occurred.
      • created_at: Timestamp indicating when the event took place.
      • ip_address: IP address from which the event originated.
      • city: City where the event occurred.
      • state: State where the event occurred.
      • postal_code: Postal code of the event location.
      • browser: Web browser used during the event.
      • traffic_source: Source of the traffic leading to the event.
      • uri: Uniform Resource Identifier associated with the event.
      • event_type: Type of event recorded.

    3. inventory_items.csv

    • Columns:
      • id: Unique identifier for each inventory item.
      • product_id: Identifier for the associated product.
      • created_at: Timestamp indicating when the inventory item was created.
      • sold_at: Timestamp indicating when the item was sold.
      • cost: Cost of the inventory item.
      • product_category: Category of the associated product.
      • product_name: Name of the associated product.
      • product_brand: Brand of the associated product.
      • product_retail_price: Retail price of the associated product.
      • product_department: Department to which the product belongs.
      • product_sku: Stock Keeping Unit (SKU) of the product.
      • product_distribution_center_id: Identifier for the distribution center associated with the product.

    4. order_items.csv

    • Columns:
      • id: Unique identifier for each order item.
      • order_id: Identifier for the associated order.
      • user_id: Identifier for the user who placed the order.
      • product_id: Identifier for the associated product.
      • inventory_item_id: Identifier for the associated inventory item.
      • status: Status of the order item.
      • created_at: Timestamp indicating when the order item was created.
      • shipped_at: Timestamp indicating when the order item was shipped.
      • delivered_at: Timestamp indicating when the order item was delivered.
      • returned_at: Timestamp indicating when the order item was returned.

    5. orders.csv

    • Columns:
      • order_id: Unique identifier for each order.
      • user_id: Identifier for the user who placed the order.
      • status: Status of the order.
      • gender: Gender information of the user.
      • created_at: Timestamp indicating when the order was created.
      • returned_at: Timestamp indicating when the order was returned.
      • shipped_at: Timestamp indicating when the order was shipped.
      • delivered_at: Timestamp indicating when the order was delivered.
      • num_of_item: Number of items in the order.

    6. products.csv

    • Columns:
      • id: Unique identifier for each product.
      • cost: Cost of the product.
      • category: Category to which the product belongs.
      • name: Name of the product.
      • brand: Brand of the product.
      • retail_price: Retail price of the product.
      • department: Department to which the product belongs.
      • sku: Stock Keeping Unit (SKU) of the product.
      • distribution_center_id: Identifier for the distribution center associated with the product.

    7. users.csv

    • Columns:
      • id: Unique identifier for each user.
      • first_name: First name of the user.
      • last_name: Last name of the user.
      • email: Email address of the user.
      • age: Age of the user.
      • gender: Gender of the user.
      • state: State where t...
  2. SAP DATASET | BigQuery Dataset

    • kaggle.com
    zip
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). SAP DATASET | BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/sap-dataset-bigquery-dataset/discussion
    Explore at:
    zip(365940125 bytes)Available download formats
    Dataset updated
    Aug 20, 2024
    Authors
    Mustafa Keser
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Certainly! Here's a description for the Kaggle dataset related to the cloud-training-demos.SAP_REPLICATED_DATA BigQuery public dataset:

    Dataset Description: SAP Replicated Data

    Dataset ID: cloud-training-demos.SAP_REPLICATED_DATA

    Overview: The SAP_REPLICATED_DATA dataset in BigQuery provides a comprehensive replication of SAP (Systems, Applications, and Products in Data Processing) business data. This dataset is designed to support data analytics and machine learning tasks by offering a rich set of structured data that mimics real-world enterprise scenarios. It includes data from various SAP modules and processes, enabling users to perform in-depth analysis, build predictive models, and explore business insights.

    Content: - Tables and Schemas: The dataset consists of multiple tables representing different aspects of SAP business operations, including but not limited to sales, inventory, finance, and procurement data. - Data Types: It contains structured data with fields such as transaction IDs, timestamps, customer details, product information, sales figures, and financial metrics. - Data Volume: The dataset is designed to simulate large-scale enterprise data, making it suitable for performance testing, data processing, and analysis.

    Usage: - Business Analytics: Users can analyze business trends, sales performance, and financial metrics. - Machine Learning: Ideal for developing and testing machine learning models related to business forecasting, anomaly detection, and customer segmentation. - Data Processing: Suitable for practicing SQL queries, data transformation, and integration tasks.

    Example Use Cases: - Sales Analysis: Track and analyze sales performance across different regions and time periods. - Inventory Management: Monitor inventory levels and identify trends in stock movements. - Financial Reporting: Generate financial reports and analyze expense patterns.

    For more information and to access the dataset, visit the BigQuery public datasets page or refer to the dataset documentation in the BigQuery console.

    Tables:

    Here's a Markdown table with the information you provided:

    File NameDescription
    adr6.csvAddresses with organizational units. Contains address details related to organizational units like departments or branches.
    adrc.csvGeneral Address Data. Provides information about addresses, including details such as street, city, and postal codes.
    adrct.csvAddress Contact Information. Contains contact information linked to addresses, including phone numbers and email addresses.
    adrt.csvAddress Details. Includes detailed address data such as street addresses, city, and country codes.
    ankt.csvAccounting Document Segment. Provides details on segments within accounting documents, including account numbers and amounts.
    anla.csvAsset Master Data. Contains information about fixed assets, including asset identification and classification.
    bkpf.csvAccounting Document Header. Contains headers of accounting documents, such as document numbers and fiscal year.
    bseg.csvAccounting Document Segment. Details line items within accounting documents, including account details and amounts.
    but000.csvBusiness Partners. Contains basic information about business partners, including IDs and names.
    but020.csvBusiness Partner Addresses. Provides address details associated with business partners.
    cepc.csvCustomer Master Data - Central. Contains centralized data for customer master records.
    cepct.csvCustomer Master Data - Contact. Provides contact details associated with customer records.
    csks.csvCost Center Master Data. Contains data about cost centers within the organization.
    cskt.csvCost Center Texts. Provides text descriptions and labels for cost centers.
    dd03l.csvData Element Field Labels. Contains labels and descriptions for data fields in the SAP system.
    ekbe.csvPurchase Order History. Details history of purchase orders, including quantities and values.
    ekes.csvPurchasing Document History. Contains history of purchasing documents including changes and statuses.
    eket.csvPurchase Order Item History. Details changes and statuses for individual purchase order items.
    ekkn.csvPurchase Order Account Assignment. Provides account assignment details for purchas...
  3. Google BigQuery Business Intelligence Report

    • equityintel.ai
    json
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Equity Intel (2025). Google BigQuery Business Intelligence Report [Dataset]. https://equityintel.ai/company/google-bigquery
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 26, 2025
    Dataset provided by
    Intelhttp://intel.com/
    Authors
    Equity Intel
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    2010 - Present
    Area covered
    CA/USA, Mountain View
    Variables measured
    Team Size, Growth Rate, Funding Amount, Market Position, Employee Sentiment, Annual Recurring Revenue (ARR)
    Description

    Comprehensive business intelligence analysis for Google BigQuery including financial metrics, founder insights, competitive positioning, and investment research. This dataset contains AI-powered analysis of leadership interviews, public content, and market intelligence for due diligence and competitive research purposes.

  4. Reddit

    • redivis.com
    application/jsonl +7
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2021). Reddit [Dataset]. https://redivis.com/datasets/prpw-49sqq9ehv
    Explore at:
    sas, stata, csv, avro, parquet, spss, application/jsonl, arrowAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Apr 12, 2006 - Aug 1, 2019
    Description

    Abstract

    Reddit posts, 2019-01-01 thru 2019-08-01.

    Documentation

    Source: https://console.cloud.google.com/bigquery?p=fh-bigquery&page=project

  5. SEC Public Dataset

    • console.cloud.google.com
    Updated Jul 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=en_GB (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=en_GB
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Learn more

  6. h

    bird-critic-1.0-bigquery

    • huggingface.co
    Updated Jan 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The BIRD Team (2025). bird-critic-1.0-bigquery [Dataset]. https://huggingface.co/datasets/birdsql/bird-critic-1.0-bigquery
    Explore at:
    Dataset updated
    Jan 19, 2025
    Dataset authored and provided by
    The BIRD Team
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    birdsql/bird-critic-1.0-bigquery dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    Data from: bigquery

    • huggingface.co
    Updated Aug 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dereje Hinsermu (2024). bigquery [Dataset]. https://huggingface.co/datasets/derekiya/bigquery
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2024
    Authors
    Dereje Hinsermu
    Description

    derekiya/bigquery dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    bigquery-gaming-analytics-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jame, bigquery-gaming-analytics-dataset [Dataset]. https://huggingface.co/datasets/xc0110/bigquery-gaming-analytics-dataset
    Explore at:
    Authors
    Jame
    Description

    xc0110/bigquery-gaming-analytics-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. NYC Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    NYC Open Data
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

    Content

    Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

    • Over 8 million 311 service requests from 2012-2016

    • More than 1 million motor vehicle collisions 2012-present

    • Citi Bike stations and 30 million Citi Bike trips 2013-present

    • Over 1 billion Yellow and Green Taxi rides from 2009-present

    • Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://opendata.cityofnewyork.us/

    https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

    The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

    Banner Photo by @bicadmedia from Unplash.

    Inspiration

    On which New York City streets are you most likely to find a loud party?

    Can you find the Virginia Pines in New York City?

    Where was the only collision caused by an animal that injured a cyclist?

    What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

    https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png

  10. h

    github_meta

    • huggingface.co
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepGit (2024). github_meta [Dataset]. https://huggingface.co/datasets/deepgit/github_meta
    Explore at:
    Dataset updated
    Aug 9, 2024
    Dataset authored and provided by
    DeepGit
    License

    https://choosealicense.com/licenses/osl-3.0/https://choosealicense.com/licenses/osl-3.0/

    Description

    Process to Generate DuckDB Dataset

      1. Load Repository Metadata
    

    Read repo_metadata.json from GitHub Public Repository Metadata Normalize JSON into three lists: Repositories → general metadata (stars, forks, license, etc.). Languages → repo-language mappings with size. Topics → repo-topic mappings.

    Convert lists into Pandas DataFrames: df_repos, df_languages, df_topics.

      2. Enhance with BigQuery Data
    

    Create a temporary BigQuery table (repo_list)… See the full description on the dataset page: https://huggingface.co/datasets/deepgit/github_meta.

  11. h

    apple-patents-bigquery

    • huggingface.co
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sutro (2025). apple-patents-bigquery [Dataset]. https://huggingface.co/datasets/sutro/apple-patents-bigquery
    Explore at:
    Dataset updated
    Sep 20, 2025
    Dataset authored and provided by
    Sutro
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Apple patent dataset associated with: https://docs.sutro.sh/examples/large-scale-embeddings

      dataset_info:
    

    features: - name: publication_number dtype: large_string - name: application_number dtype: large_string - name: country_code dtype: large_string - name: kind_code dtype: large_string - name: patent_title dtype: large_string - name: patent_abstract dtype: large_string - name: patent_claims dtype: large_string - name: patent_description… See the full description on the dataset page: https://huggingface.co/datasets/sutro/apple-patents-bigquery.

  12. SEC Public Dataset

    • console.cloud.google.com
    Updated May 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=zh-cn (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=zh-cn
    Explore at:
    Dataset updated
    May 14, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.了解详情

  13. OpenAIRE Graph Training for Scientometrics Research

    • data.europa.eu
    unknown
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). OpenAIRE Graph Training for Scientometrics Research [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-13981535?locale=no
    Explore at:
    unknown(4694366)Available download formats
    Dataset updated
    May 7, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Presentation for a hands-on training session designed to help participants learn or refine their skills in analysing OpenAIRE Graph data from the Google Cloud with Biq Query. The workshop lasted 4 hours and alternated between presentations and hands-on practice with guidance from trainers. The training covered: Introduction to Google Cloud and Big Query Introduction to the OpenAIRE Graph on BigQuery Gentle introduction to SQL Simple queries walkthrough and exercises Advanced queries (e.g., with JOINS and Big Query functions) walkthrough and exercises Data takeout + Python notebooks on Google BigQuery

  14. Open Images

    • kaggle.com
    • opendatalab.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Open Images [Dataset]. https://www.kaggle.com/bigquery/open-images
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Labeled datasets are useful in machine learning research.

    Content

    This public dataset contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.

    Tables: 1) annotations_bbox 2) dict 3) images 4) labels

    Update Frequency: Quarterly

    Querying BigQuery Tables

    Fork this kernel to get started.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:open_images

    https://cloud.google.com/bigquery/public-data/openimages

    APA-style citation: Google Research (2016). The Open Images dataset [Image urls and labels]. Available from github: https://github.com/openimages/dataset.

    Use: The annotations are licensed by Google Inc. under CC BY 4.0 license.

    The images referenced in the dataset are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

    Banner Photo by Mattias Diesel from Unsplash.

    Inspiration

    Which labels are in the dataset? Which labels have "bus" in their display names? How many images of a trolleybus are in the dataset? What are some landing pages of images with a trolleybus? Which images with cherries are in the training set?

  15. Stack Overflow BigQuery Dataset

    • live.european-language-grid.eu
    Updated Dec 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stack Overflow (2018). Stack Overflow BigQuery Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5094
    Explore at:
    Dataset updated
    Dec 30, 2018
    Dataset authored and provided by
    Stack Overflowhttp://stackoverflow.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges.

  16. h

    github-r-repos

    • huggingface.co
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Falbel (2023). github-r-repos [Dataset]. https://huggingface.co/datasets/dfalbel/github-r-repos
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2023
    Authors
    Daniel Falbel
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    GitHub R repositories dataset

    R source files from GitHub. This dataset has been created using the public GitHub datasets from Google BigQuery. This is the actual query that has been used to export the data: EXPORT DATA OPTIONS ( uri = 'gs://your-bucket/gh-r/*.parquet', format = 'PARQUET') as ( select f.id, f.repo_name, f.path, c.content, c.size from ( SELECT distinct id, repo_name, path FROM bigquery-public-data.github_repos.files where ends_with(path… See the full description on the dataset page: https://huggingface.co/datasets/dfalbel/github-r-repos.

  17. Ecommerce_bigQuery

    • kaggle.com
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chirag Givan (2024). Ecommerce_bigQuery [Dataset]. https://www.kaggle.com/datasets/chiraggivan82/ecommerce-bigquery
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chirag Givan
    Description

    About this Dataset

    Ecommerce data is typically proprietary and not shared by private companies. However, this dataset is sourced from Google Cloud's BigQuery public data. It comes from the "thelook_ecommerce" dataset, which consists of seven tables.

    Content

    This dataset contains transactional data spanning from 2019 to 2024, capturing all global consumer transactions. The company primarily sells a wide range of products, including clothing and accessories, catering to all age groups. The majority of its customers are based in the USA, China, and Brazil.

    Table Creation

    An additional data table was created from the Events table to track user sessions where a purchase was completed within the same session. This table includes details such as the date and time of the user's first interaction with the webpage, recorded as sequence number 1, as well as the date and time of the final purchase event, along with the corresponding sequence number for that session id.

  18. h

    bigquery-swift-filtered-no-duplicate

    • huggingface.co
    Updated Aug 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Parolin (2023). bigquery-swift-filtered-no-duplicate [Dataset]. https://huggingface.co/datasets/drewparo/bigquery-swift-filtered-no-duplicate
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2023
    Authors
    Andrea Parolin
    Description

    Dataset Card for "bigquery-swift-unfiltered-no-duplicate"

    More Information needed

  19. C

    Cloud Data Warehouse Solutions Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Aug 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Cloud Data Warehouse Solutions Report [Dataset]. https://www.datainsightsmarket.com/reports/cloud-data-warehouse-solutions-1385894
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Aug 15, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Cloud Data Warehouse (CDW) solutions market is experiencing robust growth, driven by the increasing need for scalable, cost-effective, and secure data storage and analytics solutions across various industries. The market's expansion is fueled by several factors, including the proliferation of big data, the rise of cloud computing adoption, and the growing demand for real-time business intelligence. Organizations are migrating from on-premise data warehouses to cloud-based solutions to leverage the benefits of scalability, elasticity, and pay-as-you-go pricing models. This shift is further accelerated by the increasing complexity of data management and the need for advanced analytics capabilities to gain actionable insights from vast datasets. Competition is fierce, with major players like Amazon Redshift, Snowflake, Google Cloud, and Microsoft Azure Synapse leading the market, each offering unique strengths and capabilities. However, the market also witnesses the emergence of niche players catering to specific industry needs or geographical regions. The overall market is segmented based on deployment models (public, private, hybrid), service models (SaaS, PaaS, IaaS), and industry verticals (finance, healthcare, retail, etc.). Future growth will likely be influenced by advancements in technologies such as AI, machine learning, and serverless computing, further enhancing the analytical capabilities of CDW solutions. The projected Compound Annual Growth Rate (CAGR) suggests a substantial increase in market value over the forecast period (2025-2033). Assuming a conservative CAGR of 15% (a reasonable estimate considering the rapid technological advancements in this space), and a 2025 market size of $50 billion (a reasonable estimate based on industry reports), the market is poised for significant expansion. This growth will be influenced by factors such as increasing data volumes, advancements in data analytics techniques, and the growing adoption of cloud-based technologies by small and medium-sized businesses (SMBs). Despite the rapid growth, challenges remain, including data security concerns, integration complexities, and vendor lock-in. However, continuous innovation and the development of robust security measures will mitigate these challenges, paving the way for sustained market growth in the coming years.

  20. Google Trends

    • console.cloud.google.com
    Updated Jun 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ES (2022). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=ES
    Explore at:
    Dataset updated
    Jun 11, 2022
    Dataset provided by
    Google Searchhttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Description

    The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
Organization logo

Looker Ecommerce BigQuery Dataset

CSV version of BigQuery Looker Ecommerce Dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mustafa Keser
Description

Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. distribution_centers.csv

  • Columns:
    • id: Unique identifier for each distribution center.
    • name: Name of the distribution center.
    • latitude: Latitude coordinate of the distribution center.
    • longitude: Longitude coordinate of the distribution center.

2. events.csv

  • Columns:
    • id: Unique identifier for each event.
    • user_id: Identifier for the user associated with the event.
    • sequence_number: Sequence number of the event.
    • session_id: Identifier for the session during which the event occurred.
    • created_at: Timestamp indicating when the event took place.
    • ip_address: IP address from which the event originated.
    • city: City where the event occurred.
    • state: State where the event occurred.
    • postal_code: Postal code of the event location.
    • browser: Web browser used during the event.
    • traffic_source: Source of the traffic leading to the event.
    • uri: Uniform Resource Identifier associated with the event.
    • event_type: Type of event recorded.

3. inventory_items.csv

  • Columns:
    • id: Unique identifier for each inventory item.
    • product_id: Identifier for the associated product.
    • created_at: Timestamp indicating when the inventory item was created.
    • sold_at: Timestamp indicating when the item was sold.
    • cost: Cost of the inventory item.
    • product_category: Category of the associated product.
    • product_name: Name of the associated product.
    • product_brand: Brand of the associated product.
    • product_retail_price: Retail price of the associated product.
    • product_department: Department to which the product belongs.
    • product_sku: Stock Keeping Unit (SKU) of the product.
    • product_distribution_center_id: Identifier for the distribution center associated with the product.

4. order_items.csv

  • Columns:
    • id: Unique identifier for each order item.
    • order_id: Identifier for the associated order.
    • user_id: Identifier for the user who placed the order.
    • product_id: Identifier for the associated product.
    • inventory_item_id: Identifier for the associated inventory item.
    • status: Status of the order item.
    • created_at: Timestamp indicating when the order item was created.
    • shipped_at: Timestamp indicating when the order item was shipped.
    • delivered_at: Timestamp indicating when the order item was delivered.
    • returned_at: Timestamp indicating when the order item was returned.

5. orders.csv

  • Columns:
    • order_id: Unique identifier for each order.
    • user_id: Identifier for the user who placed the order.
    • status: Status of the order.
    • gender: Gender information of the user.
    • created_at: Timestamp indicating when the order was created.
    • returned_at: Timestamp indicating when the order was returned.
    • shipped_at: Timestamp indicating when the order was shipped.
    • delivered_at: Timestamp indicating when the order was delivered.
    • num_of_item: Number of items in the order.

6. products.csv

  • Columns:
    • id: Unique identifier for each product.
    • cost: Cost of the product.
    • category: Category to which the product belongs.
    • name: Name of the product.
    • brand: Brand of the product.
    • retail_price: Retail price of the product.
    • department: Department to which the product belongs.
    • sku: Stock Keeping Unit (SKU) of the product.
    • distribution_center_id: Identifier for the distribution center associated with the product.

7. users.csv

  • Columns:
    • id: Unique identifier for each user.
    • first_name: First name of the user.
    • last_name: Last name of the user.
    • email: Email address of the user.
    • age: Age of the user.
    • gender: Gender of the user.
    • state: State where t...
Search
Clear search
Close search
Google apps
Main menu