10 datasets found

Looker Ecommerce BigQuery Dataset
kaggle.com
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mustafa Keser
Description
Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. distribution_centers.csv

Columns:

id: Unique identifier for each distribution center.

name: Name of the distribution center.

latitude: Latitude coordinate of the distribution center.

longitude: Longitude coordinate of the distribution center.

2. events.csv

Columns:

id: Unique identifier for each event.

user_id: Identifier for the user associated with the event.

sequence_number: Sequence number of the event.

session_id: Identifier for the session during which the event occurred.

created_at: Timestamp indicating when the event took place.

ip_address: IP address from which the event originated.

city: City where the event occurred.

state: State where the event occurred.

postal_code: Postal code of the event location.

browser: Web browser used during the event.

traffic_source: Source of the traffic leading to the event.

uri: Uniform Resource Identifier associated with the event.

event_type: Type of event recorded.

3. inventory_items.csv

Columns:

id: Unique identifier for each inventory item.

product_id: Identifier for the associated product.

created_at: Timestamp indicating when the inventory item was created.

sold_at: Timestamp indicating when the item was sold.

cost: Cost of the inventory item.

product_category: Category of the associated product.

product_name: Name of the associated product.

product_brand: Brand of the associated product.

product_retail_price: Retail price of the associated product.

product_department: Department to which the product belongs.

product_sku: Stock Keeping Unit (SKU) of the product.

product_distribution_center_id: Identifier for the distribution center associated with the product.

4. order_items.csv

Columns:

id: Unique identifier for each order item.

order_id: Identifier for the associated order.

user_id: Identifier for the user who placed the order.

product_id: Identifier for the associated product.

inventory_item_id: Identifier for the associated inventory item.

status: Status of the order item.

created_at: Timestamp indicating when the order item was created.

shipped_at: Timestamp indicating when the order item was shipped.

delivered_at: Timestamp indicating when the order item was delivered.

returned_at: Timestamp indicating when the order item was returned.

5. orders.csv

Columns:

order_id: Unique identifier for each order.

user_id: Identifier for the user who placed the order.

status: Status of the order.

gender: Gender information of the user.

created_at: Timestamp indicating when the order was created.

returned_at: Timestamp indicating when the order was returned.

shipped_at: Timestamp indicating when the order was shipped.

delivered_at: Timestamp indicating when the order was delivered.

num_of_item: Number of items in the order.

6. products.csv

Columns:

id: Unique identifier for each product.

cost: Cost of the product.

category: Category to which the product belongs.

name: Name of the product.

brand: Brand of the product.

retail_price: Retail price of the product.

department: Department to which the product belongs.

sku: Stock Keeping Unit (SKU) of the product.

distribution_center_id: Identifier for the distribution center associated with the product.

7. users.csv

Columns:

id: Unique identifier for each user.

first_name: First name of the user.

last_name: Last name of the user.

email: Email address of the user.

age: Age of the user.

gender: Gender of the user.

state: State where t...
OpenAIRE Graph Training for Scientometrics Research
data.europa.eu
unknown
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). OpenAIRE Graph Training for Scientometrics Research [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-13981535?locale=no
Explore at:
unknown(4694366)Available download formats
Dataset updated
May 7, 2025
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Presentation for a hands-on training session designed to help participants learn or refine their skills in analysing OpenAIRE Graph data from the Google Cloud with Biq Query. The workshop lasted 4 hours and alternated between presentations and hands-on practice with guidance from trainers. The training covered: Introduction to Google Cloud and Big Query Introduction to the OpenAIRE Graph on BigQuery Gentle introduction to SQL Simple queries walkthrough and exercises Advanced queries (e.g., with JOINS and Big Query functions) walkthrough and exercises Data takeout + Python notebooks on Google BigQuery

SAP DATASET | BigQuery Dataset

kaggle.com

zip

Updated Aug 20, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Mustafa Keser (2024). SAP DATASET | BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/sap-dataset-bigquery-dataset/discussion

Explore at:

zip(365940125 bytes)Available download formats

Dataset updated

Aug 20, 2024

Authors

Mustafa Keser

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Certainly! Here's a description for the Kaggle dataset related to the cloud-training-demos.SAP_REPLICATED_DATA BigQuery public dataset:

Dataset Description: SAP Replicated Data

Dataset ID: cloud-training-demos.SAP_REPLICATED_DATA

Overview: The SAP_REPLICATED_DATA dataset in BigQuery provides a comprehensive replication of SAP (Systems, Applications, and Products in Data Processing) business data. This dataset is designed to support data analytics and machine learning tasks by offering a rich set of structured data that mimics real-world enterprise scenarios. It includes data from various SAP modules and processes, enabling users to perform in-depth analysis, build predictive models, and explore business insights.

Content: - Tables and Schemas: The dataset consists of multiple tables representing different aspects of SAP business operations, including but not limited to sales, inventory, finance, and procurement data. - Data Types: It contains structured data with fields such as transaction IDs, timestamps, customer details, product information, sales figures, and financial metrics. - Data Volume: The dataset is designed to simulate large-scale enterprise data, making it suitable for performance testing, data processing, and analysis.

Usage: - Business Analytics: Users can analyze business trends, sales performance, and financial metrics. - Machine Learning: Ideal for developing and testing machine learning models related to business forecasting, anomaly detection, and customer segmentation. - Data Processing: Suitable for practicing SQL queries, data transformation, and integration tasks.

Example Use Cases: - Sales Analysis: Track and analyze sales performance across different regions and time periods. - Inventory Management: Monitor inventory levels and identify trends in stock movements. - Financial Reporting: Generate financial reports and analyze expense patterns.

For more information and to access the dataset, visit the BigQuery public datasets page or refer to the dataset documentation in the BigQuery console.

Tables:

Here's a Markdown table with the information you provided:

File Name	Description
adr6.csv	Addresses with organizational units. Contains address details related to organizational units like departments or branches.
adrc.csv	General Address Data. Provides information about addresses, including details such as street, city, and postal codes.
adrct.csv	Address Contact Information. Contains contact information linked to addresses, including phone numbers and email addresses.
adrt.csv	Address Details. Includes detailed address data such as street addresses, city, and country codes.
ankt.csv	Accounting Document Segment. Provides details on segments within accounting documents, including account numbers and amounts.
anla.csv	Asset Master Data. Contains information about fixed assets, including asset identification and classification.
bkpf.csv	Accounting Document Header. Contains headers of accounting documents, such as document numbers and fiscal year.
bseg.csv	Accounting Document Segment. Details line items within accounting documents, including account details and amounts.
but000.csv	Business Partners. Contains basic information about business partners, including IDs and names.
but020.csv	Business Partner Addresses. Provides address details associated with business partners.
cepc.csv	Customer Master Data - Central. Contains centralized data for customer master records.
cepct.csv	Customer Master Data - Contact. Provides contact details associated with customer records.
csks.csv	Cost Center Master Data. Contains data about cost centers within the organization.
cskt.csv	Cost Center Texts. Provides text descriptions and labels for cost centers.
dd03l.csv	Data Element Field Labels. Contains labels and descriptions for data fields in the SAP system.
ekbe.csv	Purchase Order History. Details history of purchase orders, including quantities and values.
ekes.csv	Purchasing Document History. Contains history of purchasing documents including changes and statuses.
eket.csv	Purchase Order Item History. Details changes and statuses for individual purchase order items.
ekkn.csv	Purchase Order Account Assignment. Provides account assignment details for purchas...

Reddit
redivis.com
application/jsonl +7
Updated Oct 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2021). Reddit [Dataset]. https://redivis.com/datasets/prpw-49sqq9ehv
Explore at:
sas, stata, csv, avro, parquet, spss, application/jsonl, arrowAvailable download formats
Dataset updated
Oct 27, 2021
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
Apr 12, 2006 - Aug 1, 2019
Description
Abstract

Reddit posts, 2019-01-01 thru 2019-08-01.

Documentation

Source: https://console.cloud.google.com/bigquery?p=fh-bigquery&page=project
NYC Open Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
NYC Open Data
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

Content

Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

Over 8 million 311 service requests from 2012-2016

More than 1 million motor vehicle collisions 2012-present

Citi Bike stations and 30 million Citi Bike trips 2013-present

Over 1 billion Yellow and Green Taxi rides from 2009-present

Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

https://opendata.cityofnewyork.us/

https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

Banner Photo by @bicadmedia from Unplash.

Inspiration

On which New York City streets are you most likely to find a loud party?

Can you find the Virginia Pines in New York City?

Where was the only collision caused by an animal that injured a cyclist?

What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png
Chicago Narcotics Crime Jan 2016 - Jul 2020
kaggle.com
zip
Updated Aug 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anugerah Erlaut (2020). Chicago Narcotics Crime Jan 2016 - Jul 2020 [Dataset]. https://www.kaggle.com/aerlaut/chicago-narcotics-jan-2016-jul-2020
Explore at:
zip(877003 bytes)Available download formats
Dataset updated
Aug 2, 2020
Authors
Anugerah Erlaut
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Area covered
Chicago
Description
Introduction

Chicago is one of America's most iconic cities. It has a colorful history, which rich histories such. Recently, Chicago was also a setting for one of Netflix's popular series : Ozark. The story has it that Chicago is the center for drug distribution for the Navarro cartel.

So, how true is the series? A quick search on the internet reveals a recently released DEA report on the. The report shows that drug crime exists in Chicago, although they are distributed by the Cartel de Jalisco Nueva Generacion, the Sinaloa Cartel and the Guerros Unidos, to name a few.

Content

The government of the City of Chicago has provided a publicly available crime database accessible via Google BigQuery. I have downloaded a subset of the data with crime_type narcotics and year > 2015. The data contains records between 1 Jan 2016 UTC until 23 Jul 2020 UTC.

The dataset contains these columns : - case_number : ID of the record - date : Date of incident - iucr : Category of the crime, per Illinois Unified Crime Reporting (IUCR) code. [more](https://data.cityofchicago.org/widgets/c7ck-438e) -description: More detailed description of the crime -location_description: Location of the crime -arrest: Whether an arrest was made -domestic: Was the crime domestic? -district: Which district code where the crime happened. [more](https://data.cityofchicago.org/Public-Safety/Boundaries-Police-Districts-current-/fthy-xz3r) -ward: The ward code where the crime happened. [more](https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Wards-2015-/sp34-6z76) -community_area` : The community area code where the crime happened. more

Acknowledgements

The data is owned and kindly provided by the City of Chicago.

Inspiration

Some questions to get you started:

Is there a trend? Is the crime increasing? or decreasing?

Is there seasonality? Are dealers more like to be out and about in summer? Do they deal inside in winter?

Are some activities more like to happen at certain locations?

We tend to think that more deals happen at night, especially as people wind down, and the surroundings get dark. Does the data reflect that?

Are the incidents clustered to a certain district? Certain type of location?

Lastly, if you are : - a newly recruited analyst at the DEA / police, what would you recommend? - asked by el jefe del cartel (boss of the cartel) on how to expand operation / operate better, what would you say?

Happy wrangling!
posts
redivis.com
Updated Oct 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2025). posts [Dataset]. https://redivis.com/datasets/prpw-49sqq9ehv
Explore at:
Dataset updated
Oct 24, 2025
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
Jan 1, 2019 - Aug 1, 2019
Description
The table posts is part of the dataset Reddit, available at https://redivis.com/datasets/prpw-49sqq9ehv. It contains 150795895 rows across 33 variables.
Bitcoin Transactions by Type
kaggle.com
zip
Updated Mar 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robson Koji Moriya (2023). Bitcoin Transactions by Type [Dataset]. https://www.kaggle.com/datasets/robsonkoji/bitcoin-transactions-by-type
Explore at:
zip(441371 bytes)Available download formats
Dataset updated
Mar 28, 2023
Authors
Robson Koji Moriya
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Transaction Types

From an end-user perspective, here is a brief overview of the purpose of each transaction type:

Pay-to-Public-Key-Hash (P2PKH) transaction: This is the most common type of transaction in Bitcoin, where the sender sends bitcoin to the recipient's Bitcoin address. P2PKH transactions are used for everyday transactions, such as buying goods or services.

Pay-to-Script-Hash (P2SH) transaction: This type of transaction allows for more complex scripts to be used as the receiving address. P2SH transactions are used to enable advanced scripting features, such as multi-sig transactions and time-locked transactions.

Multi-Signature (Multi-Sig) transaction: This type of transaction requires multiple signatures to authorize a transaction, making it more secure. Multi-sig transactions are used in situations where multiple parties need to approve a transaction, such as for joint accounts or high-value transactions.

Segregated Witness (SegWit) transaction: This is a type of transaction that separates transaction signature data from the transaction data, reducing the size of the transaction and increasing transaction capacity. SegWit transactions are used to reduce fees and improve transaction speed.

Lightning Network transaction: This is a layer 2 scaling solution that allows for instant and low-cost transactions by opening a payment channel between two parties. Lightning Network transactions are used for frequent and small-value transactions, such as micropayments and instant payments.

Types List

null: Indicates that the output script is not recognized as a known type.

pubkey: Indicates a pay-to-public-key transaction.

pubkeyhash: Indicates a pay-to-public-key-hash (P2PKH) transaction.

multisig: Indicates a multisignature transaction.

nulldata: Indicates a null data transaction.

witness_v0_keyhash: Indicates a SegWit transaction using a pay-to-witness-public-key-hash (P2WPKH) script.

witness_v0_scripthash: Indicates a SegWit transaction using a pay-to-witness-script-hash (P2WSH) script.

witness_unknown: Indicates a SegWit transaction using an unknown script type.

scripthash: Indicates a pay-to-script-hash (P2SH) transaction.

nonstandard: Indicates a non-standard transaction.

It's worth noting that this list may not cover every possible transaction type in the Bitcoin network, since there may be variations or new types of output scripts that are not yet recognized or categorized by the outputs.script_type field. Additionally, some complex transactions may use multiple output scripts of different types, which can complicate their categorization.

Distribution of Transactions Type

The distribution of transaction types in the Bitcoin/Blockchain ecosystem can vary depending on the period analyzed and the specific data source used. However, here is a general overview of the distribution of transaction types in Bitcoin:

Regular transactions (Pay-to-Public-Key-Hash or P2PKH transactions) are the most common type of transaction in the Bitcoin network. In some periods, regular transactions account for over 95% of all transactions in the network.

Pay-to-Script-Hash (P2SH) transactions are the second most common type of transaction, accounting for around 3-4% of transactions.

Multi-Signature (Multi-Sig) transactions, Segregated Witness (SegWit) transactions, and Lightning Network transactions together account for less than 1% of all transactions in the Bitcoin network.

It's important to note that the distribution of transaction types can change over time as the Bitcoin network evolves and new features and technologies are introduced. Also, the distribution of transaction types can vary across different blockchain networks other than Bitcoin.
subreddits
redivis.com
Updated Oct 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2025). subreddits [Dataset]. https://redivis.com/datasets/prpw-49sqq9ehv
Explore at:
Dataset updated
Oct 24, 2025
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
Apr 12, 2006 - Jun 29, 2013
Description
The table subreddits is part of the dataset Reddit, available at https://redivis.com/datasets/prpw-49sqq9ehv. It contains 2499 rows across 7 variables.
Customer Activity
kaggle.com
zip
Updated Nov 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NW Analytics (2022). Customer Activity [Dataset]. https://www.kaggle.com/datasets/nwanalytics/customer-activity/code
Explore at:
zip(72684 bytes)Available download formats
Dataset updated
Nov 12, 2022
Authors
NW Analytics
Description
Context

Assume you are a data analyst in an EdTech company. The company’s customer success team works with an objective to help customers get the maximum value from their product by doing deeper dives into the customer's needs, wants and expectations from the product and helping them reach their goals.

The customer success team is aiming to achieve sustainable growth by focusing on retaining the existing users.

Therefore, your team wants to analyze the activity of your existing users and understand their performance, behaviours, and patterns to gain meaningful insights, that help your customer success team take data-informed decisions.

Expected Outcome

Brainstorm and identify the right metrics and frame proper questions for analysis. Your analysis should help your customer success team to understand.

How is the current retention of the users

How are they engaging with the content

How efficiently are their discussions being resolved

In case you identify any outliers in the data set, make a note of them and exclude them from your analysis.

Build the best suitable dashboard presenting your insights.

Your recommendations must be backed by meaningful insights and professional visualizations which will help your customer success team design road maps, strategies, and action items to achieve the goal.

Tools to use:

Google Data Studio (preferred), Tableau, Power Bi or any other visualization tool

You can use BigQuery SQL if you wish, not mandatory

Overview of the Dataset

The dataset contains the basic details of the enrolled users, their learning resource completion percentages, activities on the platform and the structure of learning resources available on the platform

1.**users_basic_details**: Contains basic details of the enrolled users.

2.**day_wise_user_activity**: Contains the details of the day-wise learning activity of the users. - A user shall have one entry for a lesson in a day.

3.**learning_resource_details**: Contains the details of learning resources offered to the enrolled users - Content is stored in a hierarchical structure: Track → Course →Topic → Lesson. A lesson can be a video, practice, exam, etc. - Example: Tech Foundations → Developer Foundations → Topic 1 → lesson 1

4.**feedback_details**: Contains the feedback details/rating given by the user to a particular lesson. - Feedback rating is given on a scale of 1 to 5, 5 being the highest. - A user can give feedback to the same lesson multiple times.

5.**discussion_details**: Contains the details of the discussions created by the user for a particular lesson.

6.**discussion_comment_details**: Contains the details of the comments posted for the discussions created by the user. - Comments may be posted by mentors or users themselves. - The role of mentors is to guide and help the users by resolving the doubts and issues faced by them related to their learning activity. - A discussion can have multiple comments.

Tables Description

users_basic_details:

user_id: unique id of the user [string]

gender: gender of the enrolled user [string]

current_city: city of residence of the user [string]

batch_start_datetime: start datetime of the batch, for which the user is enrolled [datetime]

referral_source: referral channel of the user [string]

highest_qualification: highest qualification (education details) of the enrolled user [string]

day_wise_user_activity:

activity_datetime: date and time of learning of the user [datetime]

user_id: unique id of the user [string]

lesson_id: unique id of the lesson [string]

lesson_type: type of the lesson. It can be "SESSION", "PRACTICE", "EXAM" or "PROJECT" [string]

day_completion_percentage: percent of the lesson completed by the user on a particular day (out of 100%) [float]

The completion percentage is calculated by the formula = learnt duration of a lesson on a day/total duration * 100

overall_completion_percentage: overall completion percentage of the lesson till date by the user (out of 100%) [float]

Example: If a user, who started a lesson on Jan 1, ’22 completes the lesson by learning it in parts (10%, 35%, 37%, 18% each day) on 4 different days, Then

Jan 1, ‘22 - day_completion_percentage - 10%, overall_completion_percentage - 10%

Jan 3, ‘22 - day_completion_percentage - 35%, overall_completion_percentage - 45%

Jan 4, ‘22 - day_completion_percentage - 37%, overall_completion_percentage - 82%

Jan 6, ‘22 - day_completion_percentage - 18%, overall_completion_percentage - 100%

learning_resource_details:

track_id: unique id of the track [string]

track_title: name of the track [string]

course_id: unique id of the course [string]

**`...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset

Looker Ecommerce BigQuery Dataset

CSV version of BigQuery Looker Ecommerce Dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 18, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Mustafa Keser

Description

Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. `distribution_centers.csv`

Columns:
- id: Unique identifier for each distribution center.
- name: Name of the distribution center.
- latitude: Latitude coordinate of the distribution center.
- longitude: Longitude coordinate of the distribution center.

2. `events.csv`

Columns:
- id: Unique identifier for each event.
- user_id: Identifier for the user associated with the event.
- sequence_number: Sequence number of the event.
- session_id: Identifier for the session during which the event occurred.
- created_at: Timestamp indicating when the event took place.
- ip_address: IP address from which the event originated.
- city: City where the event occurred.
- state: State where the event occurred.
- postal_code: Postal code of the event location.
- browser: Web browser used during the event.
- traffic_source: Source of the traffic leading to the event.
- uri: Uniform Resource Identifier associated with the event.
- event_type: Type of event recorded.

3. `inventory_items.csv`

Columns:
- id: Unique identifier for each inventory item.
- product_id: Identifier for the associated product.
- created_at: Timestamp indicating when the inventory item was created.
- sold_at: Timestamp indicating when the item was sold.
- cost: Cost of the inventory item.
- product_category: Category of the associated product.
- product_name: Name of the associated product.
- product_brand: Brand of the associated product.
- product_retail_price: Retail price of the associated product.
- product_department: Department to which the product belongs.
- product_sku: Stock Keeping Unit (SKU) of the product.
- product_distribution_center_id: Identifier for the distribution center associated with the product.

4. `order_items.csv`

Columns:
- id: Unique identifier for each order item.
- order_id: Identifier for the associated order.
- user_id: Identifier for the user who placed the order.
- product_id: Identifier for the associated product.
- inventory_item_id: Identifier for the associated inventory item.
- status: Status of the order item.
- created_at: Timestamp indicating when the order item was created.
- shipped_at: Timestamp indicating when the order item was shipped.
- delivered_at: Timestamp indicating when the order item was delivered.
- returned_at: Timestamp indicating when the order item was returned.

5. `orders.csv`

Columns:
- order_id: Unique identifier for each order.
- user_id: Identifier for the user who placed the order.
- status: Status of the order.
- gender: Gender information of the user.
- created_at: Timestamp indicating when the order was created.
- returned_at: Timestamp indicating when the order was returned.
- shipped_at: Timestamp indicating when the order was shipped.
- delivered_at: Timestamp indicating when the order was delivered.
- num_of_item: Number of items in the order.

6. `products.csv`

Columns:
- id: Unique identifier for each product.
- cost: Cost of the product.
- category: Category to which the product belongs.
- name: Name of the product.
- brand: Brand of the product.
- retail_price: Retail price of the product.
- department: Department to which the product belongs.
- sku: Stock Keeping Unit (SKU) of the product.
- distribution_center_id: Identifier for the distribution center associated with the product.

7. `users.csv`

Columns:
- id: Unique identifier for each user.
- first_name: First name of the user.
- last_name: Last name of the user.
- email: Email address of the user.
- age: Age of the user.
- gender: Gender of the user.
- state: State where t...

Clear search

Close search

Google apps

Main menu

Looker Ecommerce BigQuery Dataset

Looker Ecommerce Dataset Description

1. distribution_centers.csv

2. events.csv

3. inventory_items.csv

4. order_items.csv

5. orders.csv

6. products.csv

7. users.csv

OpenAIRE Graph Training for Scientometrics Research

SAP DATASET | BigQuery Dataset

Dataset Description: SAP Replicated Data

Tables:

Reddit

Abstract

Documentation

NYC Open Data

Context

Content

Acknowledgements

Inspiration

Chicago Narcotics Crime Jan 2016 - Jul 2020

Introduction

Content

Acknowledgements

Inspiration

posts

Bitcoin Transactions by Type

Transaction Types

Types List

Distribution of Transactions Type

subreddits

Customer Activity

Context

Expected Outcome

Tools to use:

Overview of the Dataset

Tables Description

Looker Ecommerce BigQuery Dataset

CSV version of BigQuery Looker Ecommerce Dataset

Looker Ecommerce Dataset Description

1. distribution_centers.csv

2. events.csv

3. inventory_items.csv

4. order_items.csv

5. orders.csv

6. products.csv

7. users.csv

1. `distribution_centers.csv`

2. `events.csv`

3. `inventory_items.csv`

4. `order_items.csv`

5. `orders.csv`

6. `products.csv`

7. `users.csv`

1. `distribution_centers.csv`

2. `events.csv`

3. `inventory_items.csv`

4. `order_items.csv`

5. `orders.csv`

6. `products.csv`

7. `users.csv`