100+ datasets found

Looker Ecommerce BigQuery Dataset
kaggle.com
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mustafa Keser
Description
Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. distribution_centers.csv

Columns:

id: Unique identifier for each distribution center.

name: Name of the distribution center.

latitude: Latitude coordinate of the distribution center.

longitude: Longitude coordinate of the distribution center.

2. events.csv

Columns:

id: Unique identifier for each event.

user_id: Identifier for the user associated with the event.

sequence_number: Sequence number of the event.

session_id: Identifier for the session during which the event occurred.

created_at: Timestamp indicating when the event took place.

ip_address: IP address from which the event originated.

city: City where the event occurred.

state: State where the event occurred.

postal_code: Postal code of the event location.

browser: Web browser used during the event.

traffic_source: Source of the traffic leading to the event.

uri: Uniform Resource Identifier associated with the event.

event_type: Type of event recorded.

3. inventory_items.csv

Columns:

id: Unique identifier for each inventory item.

product_id: Identifier for the associated product.

created_at: Timestamp indicating when the inventory item was created.

sold_at: Timestamp indicating when the item was sold.

cost: Cost of the inventory item.

product_category: Category of the associated product.

product_name: Name of the associated product.

product_brand: Brand of the associated product.

product_retail_price: Retail price of the associated product.

product_department: Department to which the product belongs.

product_sku: Stock Keeping Unit (SKU) of the product.

product_distribution_center_id: Identifier for the distribution center associated with the product.

4. order_items.csv

Columns:

id: Unique identifier for each order item.

order_id: Identifier for the associated order.

user_id: Identifier for the user who placed the order.

product_id: Identifier for the associated product.

inventory_item_id: Identifier for the associated inventory item.

status: Status of the order item.

created_at: Timestamp indicating when the order item was created.

shipped_at: Timestamp indicating when the order item was shipped.

delivered_at: Timestamp indicating when the order item was delivered.

returned_at: Timestamp indicating when the order item was returned.

5. orders.csv

Columns:

order_id: Unique identifier for each order.

user_id: Identifier for the user who placed the order.

status: Status of the order.

gender: Gender information of the user.

created_at: Timestamp indicating when the order was created.

returned_at: Timestamp indicating when the order was returned.

shipped_at: Timestamp indicating when the order was shipped.

delivered_at: Timestamp indicating when the order was delivered.

num_of_item: Number of items in the order.

6. products.csv

Columns:

id: Unique identifier for each product.

cost: Cost of the product.

category: Category to which the product belongs.

name: Name of the product.

brand: Brand of the product.

retail_price: Retail price of the product.

department: Department to which the product belongs.

sku: Stock Keeping Unit (SKU) of the product.

distribution_center_id: Identifier for the distribution center associated with the product.

7. users.csv

Columns:

id: Unique identifier for each user.

first_name: First name of the user.

last_name: Last name of the user.

email: Email address of the user.

age: Age of the user.

gender: Gender of the user.

state: State where t...
Data from: Hacker News
console.cloud.google.com
Updated Aug 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Y%20Combinator&hl=en-GB (2023). Hacker News [Dataset]. https://console.cloud.google.com/marketplace/product/y-combinator/hacker-news?hl=en-GB
Explore at:
Dataset updated
Aug 10, 2023
Dataset provided by
Googlehttp://google.com/
Description
This dataset contains all stories and comments from Hacker News from its launch in 2006 to present. Each story contains a story ID, the author that made the post, when it was written, and the number of points the story received. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Google Ads Transparency Center
console.cloud.google.com
Updated Sep 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
Explore at:
Dataset updated
Sep 6, 2023
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description
This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
BigQuery - Data Processing Queries
kaggle.com
zip
Updated Dec 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reuben Pereira (2020). BigQuery - Data Processing Queries [Dataset]. https://www.kaggle.com/reubencpereira/bigquery-data-processing-queries
Explore at:
zip(30758 bytes)Available download formats
Dataset updated
Dec 2, 2020
Authors
Reuben Pereira
Description
Dataset

This dataset was created by Reuben Pereira

Contents
h
Data from: bigquery
huggingface.co
Updated Aug 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dereje Hinsermu (2024). bigquery [Dataset]. https://huggingface.co/datasets/derekiya/bigquery
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2024
Authors
Dereje Hinsermu
Description
derekiya/bigquery dataset hosted on Hugging Face and contributed by the HF Datasets community
noaa-global-forecast-system
console.cloud.google.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data, noaa-global-forecast-system [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/noaa-global-forecast-system
Explore at:
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description
The Global Forecast System (GFS) is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). The GFS dataset consists of selected model outputs (described below) as gridded forecast variables. The 384-hour forecasts, with 3-hour forecast interval, are made at 6-hour temporal resolution (i.e. updated four times daily). Use the 'creation_time' and 'forecast_time' properties to select data of interest. The GFS is a coupled model, composed of an atmosphere model, an ocean model, a land/soil model, and a sea ice model which work together to provide an accurate picture of weather conditions. See history of recent modifications to the global forecast/analysis system , the model performance statistical web page , and the documentation homepage for more information.Learn more
Data from: Stack Overflow
console.cloud.google.com
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&hl=id (2024). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/details/stack-exchange/stack-overflow?hl=id
Explore at:
Dataset updated
Aug 13, 2024
Dataset provided by
Googlehttp://google.com/
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
Ecommerce_bigQuery
kaggle.com
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chirag Givan (2024). Ecommerce_bigQuery [Dataset]. https://www.kaggle.com/datasets/chiraggivan82/ecommerce-bigquery
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 1, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chirag Givan
Description
About this Dataset

Ecommerce data is typically proprietary and not shared by private companies. However, this dataset is sourced from Google Cloud's BigQuery public data. It comes from the "thelook_ecommerce" dataset, which consists of seven tables.

Content

This dataset contains transactional data spanning from 2019 to 2024, capturing all global consumer transactions. The company primarily sells a wide range of products, including clothing and accessories, catering to all age groups. The majority of its customers are based in the USA, China, and Brazil.

Table Creation

An additional data table was created from the Events table to track user sessions where a purchase was completed within the same session. This table includes details such as the date and time of the user's first interaction with the webpage, recorded as sequence number 1, as well as the date and time of the final purchase event, along with the corresponding sequence number for that session id.
C
Cloud Data Warehouse Solutions Report
datainsightsmarket.com
doc, pdf, ppt
Updated Aug 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Cloud Data Warehouse Solutions Report [Dataset]. https://www.datainsightsmarket.com/reports/cloud-data-warehouse-solutions-1385894
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Aug 15, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Cloud Data Warehouse (CDW) solutions market is experiencing robust growth, driven by the increasing need for scalable, cost-effective, and secure data storage and analytics solutions across various industries. The market's expansion is fueled by several factors, including the proliferation of big data, the rise of cloud computing adoption, and the growing demand for real-time business intelligence. Organizations are migrating from on-premise data warehouses to cloud-based solutions to leverage the benefits of scalability, elasticity, and pay-as-you-go pricing models. This shift is further accelerated by the increasing complexity of data management and the need for advanced analytics capabilities to gain actionable insights from vast datasets. Competition is fierce, with major players like Amazon Redshift, Snowflake, Google Cloud, and Microsoft Azure Synapse leading the market, each offering unique strengths and capabilities. However, the market also witnesses the emergence of niche players catering to specific industry needs or geographical regions. The overall market is segmented based on deployment models (public, private, hybrid), service models (SaaS, PaaS, IaaS), and industry verticals (finance, healthcare, retail, etc.). Future growth will likely be influenced by advancements in technologies such as AI, machine learning, and serverless computing, further enhancing the analytical capabilities of CDW solutions. The projected Compound Annual Growth Rate (CAGR) suggests a substantial increase in market value over the forecast period (2025-2033). Assuming a conservative CAGR of 15% (a reasonable estimate considering the rapid technological advancements in this space), and a 2025 market size of $50 billion (a reasonable estimate based on industry reports), the market is poised for significant expansion. This growth will be influenced by factors such as increasing data volumes, advancements in data analytics techniques, and the growing adoption of cloud-based technologies by small and medium-sized businesses (SMBs). Despite the rapid growth, challenges remain, including data security concerns, integration complexities, and vendor lock-in. However, continuous innovation and the development of robust security measures will mitigate these challenges, paving the way for sustained market growth in the coming years.
AlphaFold Protein Structure Database
console.cloud.google.com
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=en-GB (2023). AlphaFold Protein Structure Database [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/deepmind-alphafold?hl=en-GB
Explore at:
Dataset updated
Aug 9, 2023
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
License
Description
The AlphaFold Protein Structure Database is a collection of protein structure predictions made using the machine learning model AlphaFold. AlphaFold was developed by DeepMind , and this database was created in partnership with EMBL-EBI . For information on how to interpret, download and query the data, as well as on which proteins are included / excluded, and change log, please see our main dataset guide and FAQs . To interactively view individual entries or to download proteomes / Swiss-Prot please visit https://alphafold.ebi.ac.uk/ . The current release aims to cover most of the over 200M sequences in UniProt (a commonly used reference set of annotated proteins). The files provided for each entry include the structure plus two model confidence metrics (pLDDT and PAE). The files can be found in the Google Cloud Storage bucket gs://public-datasets-deepmind-alphafold-v4 with metadata in the BigQuery table bigquery-public-data.deepmind_alphafold.metadata . If you use this data, please cite: Jumper, J et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021) Varadi, M et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research (2021) This public dataset is hosted in Google Cloud Storage and is available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.
Google Trends
console.cloud.google.com
Updated Jun 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ES (2022). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=ES
Explore at:
Dataset updated
Jun 11, 2022
Dataset provided by
Google Searchhttp://google.com/
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description
The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
h
github_meta
huggingface.co
Updated Aug 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepGit (2024). github_meta [Dataset]. https://huggingface.co/datasets/deepgit/github_meta
Explore at:
Dataset updated
Aug 9, 2024
Dataset authored and provided by
DeepGit
License
https://choosealicense.com/licenses/osl-3.0/https://choosealicense.com/licenses/osl-3.0/
Description
Process to Generate DuckDB Dataset

1. Load Repository Metadata

Read repo_metadata.json from GitHub Public Repository Metadata Normalize JSON into three lists: Repositories → general metadata (stars, forks, license, etc.). Languages → repo-language mappings with size. Topics → repo-topic mappings.

Convert lists into Pandas DataFrames: df_repos, df_languages, df_topics.

2. Enhance with BigQuery Data

Create a temporary BigQuery table (repo_list)… See the full description on the dataset page: https://huggingface.co/datasets/deepgit/github_meta.
Google's Diversity Annual Report Data
console.cloud.google.com
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse(cameo:product/rivery-public/rivery)?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ja (2023). Google's Diversity Annual Report Data [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-diversity-annual-report(cameo:product/rivery-public/rivery)?hl=ja
Explore at:
Dataset updated
May 9, 2023
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description
This dataset contains current and historical demographic data on Google's workforce since the company began publishing diversity data in 2014. It includes data collected for government reporting and voluntary employee self-identification globally relating to hiring, retention, and representation categorized by race, gender, sexual orientation, gender identity, disability status, and military status. In some instances, the data is limited due to various government policies around the world and the desire to protect Googler confidentiality. All data in this dataset will be updated yearly upon publication of Google’s Diversity Annual Report . Google uses this data to inform its diversity, equity, and inclusion work. More information on our methodology can be found in the Diversity Annual Report. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
COVID-19 Open Data
console.cloud.google.com
Updated Jun 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=fr (2023). COVID-19 Open Data [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-open-data?hl=fr
Explore at:
Dataset updated
Jun 22, 2023
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This repository contains the largest COVID-19 epidemiological database available in addition to a powerful set of expansive covariates. It includes open sourced data with a permissive license (enabling commercial use) relating to vaccinations, epidemiology, hospitalizations, demographics, economy, geography, health, mobility, government response, weather, and more. Moreover, the data merges daily time-series from hundreds of data sources at a fine spatial resolution, containing over 20,000 locations and using a consistent set of region keys. This dataset is available in both the US and EU regions of BigQuery at the following links: COVID-19 Open Data: US Region COVID-19 Open Data: EU Region All data in this dataset is retrieved automatically. When possible, data is retrieved directly from the relevant authorities, like a country's ministry of health. This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
B
Big Data Processing And Distribution Systems Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Big Data Processing And Distribution Systems Report [Dataset]. https://www.datainsightsmarket.com/reports/big-data-processing-and-distribution-systems-528339
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Big Data Processing and Distribution Systems market is experiencing robust growth, driven by the exponential increase in data volume across various industries. The market, estimated at $50 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $150 billion by 2033. This expansion is fueled by several key factors. The rising adoption of cloud-based solutions, offering scalability and cost-effectiveness, is a significant driver. Furthermore, the increasing demand for real-time analytics and advanced data processing capabilities across sectors like finance, healthcare, and e-commerce are propelling market growth. The emergence of new technologies such as edge computing and AI-powered analytics is further accelerating the adoption of sophisticated big data processing solutions. However, market growth is not without its challenges. Data security and privacy concerns, coupled with the complexity of implementing and managing big data systems, remain significant restraints. The need for specialized skills and expertise in data science and engineering also contributes to the overall cost and complexity of adoption. Despite these challenges, the market's continued expansion is anticipated, driven by the persistent need for efficient and insightful data management in an increasingly data-driven world. Segmentation within the market is diverse, encompassing various solutions including cloud-based platforms, on-premise systems, and specialized tools for data integration, processing, and visualization. Leading players such as Google, AWS, Microsoft, Snowflake, and Databricks are fiercely competing to capture market share, further stimulating innovation and driving market expansion.
Austin Crime Data
console.cloud.google.com
Updated Apr 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:City%20of%20Austin&hl=de (2023). Austin Crime Data [Dataset]. https://console.cloud.google.com/marketplace/product/city-of-austin/austin-crime?hl=de
Explore at:
Dataset updated
Apr 28, 2023
Dataset provided by
Googlehttp://google.com/
Area covered
Austin
Description
This dataset includes Part 1 crimes (as defined by Uniform Crime Reporting Statistics ) for 2014 and 2015. Data is provided by the Austin Police Department and may differ from official APD crime data due to the variety of reporting and collection methods used. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
COVID-19 Data Repository by CSSE at JHU
console.cloud.google.com
Updated Mar 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Johns%20Hopkins%20University&hl=it (2023). COVID-19 Data Repository by CSSE at JHU [Dataset]. https://console.cloud.google.com/marketplace/product/johnshopkins/covid19_jhu_global_case?hl=it
Explore at:
Dataset updated
Mar 26, 2023
Dataset provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). The data include the location and number of confirmed COVID-19 cases, deaths, and recoveries for all affected countries, aggregated at the appropriate province/state. It was developed to enable researchers, public health authorities and the general public to track the outbreak. Additional information is available in the blog post, Mapping 2019-nCoV , and included data sources are listed here . For publications that use the data, please cite the following publication Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate.
Data from NCI Imaging Data Commons
console.cloud.google.com
Updated Apr 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=pt (2023). Data from NCI Imaging Data Commons [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/nci-idc-data?hl=pt
Explore at:
Dataset updated
Apr 7, 2023
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description
Imaging Data Commons (IDC) is a repository within the Cancer Research Data Commons (CRDC) that manages imaging data and enables its integration with the other components of CRDC. Further details about IDC are available in this publication . IDC hosts a growing number of imaging collections that are contributed by either funded US National Cancer Institute (NCI) data collection activities, or by the individual researchers. Image data hosted by IDC is stored in DICOM format. This public dataset consists of the following components: 1. BigQuery Metadata tables : these include DICOM metadata attributes extracted from the DICOM data into BigQuery tables, which are further enriched by including collection-level metadata that is not available in DICOM. 2. DICOM files : these files are available in Storage buckets. This dataset is rather large (~40TB), and is updated monthly, which makes is challenging to download all of the files. Instead, users should utilize the BigQuery tables to search and identify files of interest, which then can be downloaded selectively from Cloud Storage. Please see the download instructions page in Imaging Data Commons documentation: https://learn.canceridc.dev/data/downloading-data See further details about data organization in IDC documentation .
C
Cloud Data Warehouse Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Cloud Data Warehouse Report [Dataset]. https://www.datainsightsmarket.com/reports/cloud-data-warehouse-1958553
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The cloud data warehouse market is experiencing robust growth, driven by the increasing need for scalable, cost-effective, and readily accessible data analytics solutions. The market's expansion is fueled by several key factors, including the burgeoning adoption of cloud computing across various industries, the proliferation of big data, and the growing demand for real-time business intelligence. Organizations are migrating from on-premise data warehouses to cloud-based solutions to leverage enhanced scalability, reduced infrastructure costs, and improved agility. This shift is further accelerated by the availability of advanced analytics tools and services within the cloud ecosystem, enabling businesses to derive actionable insights from their data more efficiently. Competitive pressures and the need to gain a competitive edge are also significant drivers, pushing enterprises to adopt sophisticated data warehousing solutions capable of handling complex analytical workloads. The market is highly fragmented, with major players such as Amazon, Google, Microsoft, and others competing intensely through innovation, strategic partnerships, and aggressive pricing strategies. While the market shows significant promise, certain challenges persist. Data security and privacy concerns remain a major obstacle to wider adoption, particularly in regulated industries. Integration complexities with existing on-premise systems and the need for skilled professionals to manage and maintain cloud data warehouses also present hurdles. However, ongoing technological advancements in areas such as data encryption, access control, and automated data integration are mitigating these challenges. Furthermore, the emergence of new technologies, such as serverless architectures and AI-powered analytics, is continuously reshaping the market landscape, fostering innovation and expanding the market's potential. Over the forecast period (2025-2033), consistent growth is anticipated, fueled by ongoing digital transformation initiatives across various sectors. We estimate a conservative CAGR (considering industry averages for similar tech sectors) of 15% over this period, indicating substantial growth opportunities.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset

Looker Ecommerce BigQuery Dataset

CSV version of BigQuery Looker Ecommerce Dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 18, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Mustafa Keser

Description

Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. `distribution_centers.csv`

Columns:
- id: Unique identifier for each distribution center.
- name: Name of the distribution center.
- latitude: Latitude coordinate of the distribution center.
- longitude: Longitude coordinate of the distribution center.

2. `events.csv`

Columns:
- id: Unique identifier for each event.
- user_id: Identifier for the user associated with the event.
- sequence_number: Sequence number of the event.
- session_id: Identifier for the session during which the event occurred.
- created_at: Timestamp indicating when the event took place.
- ip_address: IP address from which the event originated.
- city: City where the event occurred.
- state: State where the event occurred.
- postal_code: Postal code of the event location.
- browser: Web browser used during the event.
- traffic_source: Source of the traffic leading to the event.
- uri: Uniform Resource Identifier associated with the event.
- event_type: Type of event recorded.

3. `inventory_items.csv`

Columns:
- id: Unique identifier for each inventory item.
- product_id: Identifier for the associated product.
- created_at: Timestamp indicating when the inventory item was created.
- sold_at: Timestamp indicating when the item was sold.
- cost: Cost of the inventory item.
- product_category: Category of the associated product.
- product_name: Name of the associated product.
- product_brand: Brand of the associated product.
- product_retail_price: Retail price of the associated product.
- product_department: Department to which the product belongs.
- product_sku: Stock Keeping Unit (SKU) of the product.
- product_distribution_center_id: Identifier for the distribution center associated with the product.

4. `order_items.csv`

Columns:
- id: Unique identifier for each order item.
- order_id: Identifier for the associated order.
- user_id: Identifier for the user who placed the order.
- product_id: Identifier for the associated product.
- inventory_item_id: Identifier for the associated inventory item.
- status: Status of the order item.
- created_at: Timestamp indicating when the order item was created.
- shipped_at: Timestamp indicating when the order item was shipped.
- delivered_at: Timestamp indicating when the order item was delivered.
- returned_at: Timestamp indicating when the order item was returned.

5. `orders.csv`

Columns:
- order_id: Unique identifier for each order.
- user_id: Identifier for the user who placed the order.
- status: Status of the order.
- gender: Gender information of the user.
- created_at: Timestamp indicating when the order was created.
- returned_at: Timestamp indicating when the order was returned.
- shipped_at: Timestamp indicating when the order was shipped.
- delivered_at: Timestamp indicating when the order was delivered.
- num_of_item: Number of items in the order.

6. `products.csv`

Columns:
- id: Unique identifier for each product.
- cost: Cost of the product.
- category: Category to which the product belongs.
- name: Name of the product.
- brand: Brand of the product.
- retail_price: Retail price of the product.
- department: Department to which the product belongs.
- sku: Stock Keeping Unit (SKU) of the product.
- distribution_center_id: Identifier for the distribution center associated with the product.

7. `users.csv`

Columns:
- id: Unique identifier for each user.
- first_name: First name of the user.
- last_name: Last name of the user.
- email: Email address of the user.
- age: Age of the user.
- gender: Gender of the user.
- state: State where t...

Clear search

Close search

Google apps

Main menu

Looker Ecommerce BigQuery Dataset

Looker Ecommerce Dataset Description

1. distribution_centers.csv

2. events.csv

3. inventory_items.csv

4. order_items.csv

5. orders.csv

6. products.csv

7. users.csv

Data from: Hacker News

Google Ads Transparency Center

BigQuery - Data Processing Queries

Dataset

Contents

Data from: bigquery

noaa-global-forecast-system

Data from: Stack Overflow

USA Name Data

Context

Content

Acknowledgements

Inspiration

Ecommerce_bigQuery

About this Dataset

Content

Table Creation

Cloud Data Warehouse Solutions Report

AlphaFold Protein Structure Database

Google Trends

github_meta

Google's Diversity Annual Report Data

COVID-19 Open Data

Big Data Processing And Distribution Systems Report

Austin Crime Data

COVID-19 Data Repository by CSSE at JHU

Data from NCI Imaging Data Commons

Cloud Data Warehouse Report

Looker Ecommerce BigQuery Dataset

CSV version of BigQuery Looker Ecommerce Dataset

Looker Ecommerce Dataset Description

1. distribution_centers.csv

2. events.csv

3. inventory_items.csv

4. order_items.csv

5. orders.csv

6. products.csv

7. users.csv

1. `distribution_centers.csv`

2. `events.csv`

3. `inventory_items.csv`

4. `order_items.csv`

5. `orders.csv`

6. `products.csv`

7. `users.csv`

1. `distribution_centers.csv`

2. `events.csv`

3. `inventory_items.csv`

4. `order_items.csv`

5. `orders.csv`

6. `products.csv`

7. `users.csv`