10 datasets found
  1. Looker Ecommerce BigQuery Dataset

    • kaggle.com
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mustafa Keser
    Description

    Looker Ecommerce Dataset Description

    CSV version of Looker Ecommerce Dataset.

    Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

    1. distribution_centers.csv

    • Columns:
      • id: Unique identifier for each distribution center.
      • name: Name of the distribution center.
      • latitude: Latitude coordinate of the distribution center.
      • longitude: Longitude coordinate of the distribution center.

    2. events.csv

    • Columns:
      • id: Unique identifier for each event.
      • user_id: Identifier for the user associated with the event.
      • sequence_number: Sequence number of the event.
      • session_id: Identifier for the session during which the event occurred.
      • created_at: Timestamp indicating when the event took place.
      • ip_address: IP address from which the event originated.
      • city: City where the event occurred.
      • state: State where the event occurred.
      • postal_code: Postal code of the event location.
      • browser: Web browser used during the event.
      • traffic_source: Source of the traffic leading to the event.
      • uri: Uniform Resource Identifier associated with the event.
      • event_type: Type of event recorded.

    3. inventory_items.csv

    • Columns:
      • id: Unique identifier for each inventory item.
      • product_id: Identifier for the associated product.
      • created_at: Timestamp indicating when the inventory item was created.
      • sold_at: Timestamp indicating when the item was sold.
      • cost: Cost of the inventory item.
      • product_category: Category of the associated product.
      • product_name: Name of the associated product.
      • product_brand: Brand of the associated product.
      • product_retail_price: Retail price of the associated product.
      • product_department: Department to which the product belongs.
      • product_sku: Stock Keeping Unit (SKU) of the product.
      • product_distribution_center_id: Identifier for the distribution center associated with the product.

    4. order_items.csv

    • Columns:
      • id: Unique identifier for each order item.
      • order_id: Identifier for the associated order.
      • user_id: Identifier for the user who placed the order.
      • product_id: Identifier for the associated product.
      • inventory_item_id: Identifier for the associated inventory item.
      • status: Status of the order item.
      • created_at: Timestamp indicating when the order item was created.
      • shipped_at: Timestamp indicating when the order item was shipped.
      • delivered_at: Timestamp indicating when the order item was delivered.
      • returned_at: Timestamp indicating when the order item was returned.

    5. orders.csv

    • Columns:
      • order_id: Unique identifier for each order.
      • user_id: Identifier for the user who placed the order.
      • status: Status of the order.
      • gender: Gender information of the user.
      • created_at: Timestamp indicating when the order was created.
      • returned_at: Timestamp indicating when the order was returned.
      • shipped_at: Timestamp indicating when the order was shipped.
      • delivered_at: Timestamp indicating when the order was delivered.
      • num_of_item: Number of items in the order.

    6. products.csv

    • Columns:
      • id: Unique identifier for each product.
      • cost: Cost of the product.
      • category: Category to which the product belongs.
      • name: Name of the product.
      • brand: Brand of the product.
      • retail_price: Retail price of the product.
      • department: Department to which the product belongs.
      • sku: Stock Keeping Unit (SKU) of the product.
      • distribution_center_id: Identifier for the distribution center associated with the product.

    7. users.csv

    • Columns:
      • id: Unique identifier for each user.
      • first_name: First name of the user.
      • last_name: Last name of the user.
      • email: Email address of the user.
      • age: Age of the user.
      • gender: Gender of the user.
      • state: State where t...
  2. OpenAIRE Graph Training for Scientometrics Research

    • data.europa.eu
    unknown
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). OpenAIRE Graph Training for Scientometrics Research [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-13981535?locale=no
    Explore at:
    unknown(4694366)Available download formats
    Dataset updated
    May 7, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Presentation for a hands-on training session designed to help participants learn or refine their skills in analysing OpenAIRE Graph data from the Google Cloud with Biq Query. The workshop lasted 4 hours and alternated between presentations and hands-on practice with guidance from trainers. The training covered: Introduction to Google Cloud and Big Query Introduction to the OpenAIRE Graph on BigQuery Gentle introduction to SQL Simple queries walkthrough and exercises Advanced queries (e.g., with JOINS and Big Query functions) walkthrough and exercises Data takeout + Python notebooks on Google BigQuery

  3. SAP DATASET | BigQuery Dataset

    • kaggle.com
    zip
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). SAP DATASET | BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/sap-dataset-bigquery-dataset/discussion
    Explore at:
    zip(365940125 bytes)Available download formats
    Dataset updated
    Aug 20, 2024
    Authors
    Mustafa Keser
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Certainly! Here's a description for the Kaggle dataset related to the cloud-training-demos.SAP_REPLICATED_DATA BigQuery public dataset:

    Dataset Description: SAP Replicated Data

    Dataset ID: cloud-training-demos.SAP_REPLICATED_DATA

    Overview: The SAP_REPLICATED_DATA dataset in BigQuery provides a comprehensive replication of SAP (Systems, Applications, and Products in Data Processing) business data. This dataset is designed to support data analytics and machine learning tasks by offering a rich set of structured data that mimics real-world enterprise scenarios. It includes data from various SAP modules and processes, enabling users to perform in-depth analysis, build predictive models, and explore business insights.

    Content: - Tables and Schemas: The dataset consists of multiple tables representing different aspects of SAP business operations, including but not limited to sales, inventory, finance, and procurement data. - Data Types: It contains structured data with fields such as transaction IDs, timestamps, customer details, product information, sales figures, and financial metrics. - Data Volume: The dataset is designed to simulate large-scale enterprise data, making it suitable for performance testing, data processing, and analysis.

    Usage: - Business Analytics: Users can analyze business trends, sales performance, and financial metrics. - Machine Learning: Ideal for developing and testing machine learning models related to business forecasting, anomaly detection, and customer segmentation. - Data Processing: Suitable for practicing SQL queries, data transformation, and integration tasks.

    Example Use Cases: - Sales Analysis: Track and analyze sales performance across different regions and time periods. - Inventory Management: Monitor inventory levels and identify trends in stock movements. - Financial Reporting: Generate financial reports and analyze expense patterns.

    For more information and to access the dataset, visit the BigQuery public datasets page or refer to the dataset documentation in the BigQuery console.

    Tables:

    Here's a Markdown table with the information you provided:

    File NameDescription
    adr6.csvAddresses with organizational units. Contains address details related to organizational units like departments or branches.
    adrc.csvGeneral Address Data. Provides information about addresses, including details such as street, city, and postal codes.
    adrct.csvAddress Contact Information. Contains contact information linked to addresses, including phone numbers and email addresses.
    adrt.csvAddress Details. Includes detailed address data such as street addresses, city, and country codes.
    ankt.csvAccounting Document Segment. Provides details on segments within accounting documents, including account numbers and amounts.
    anla.csvAsset Master Data. Contains information about fixed assets, including asset identification and classification.
    bkpf.csvAccounting Document Header. Contains headers of accounting documents, such as document numbers and fiscal year.
    bseg.csvAccounting Document Segment. Details line items within accounting documents, including account details and amounts.
    but000.csvBusiness Partners. Contains basic information about business partners, including IDs and names.
    but020.csvBusiness Partner Addresses. Provides address details associated with business partners.
    cepc.csvCustomer Master Data - Central. Contains centralized data for customer master records.
    cepct.csvCustomer Master Data - Contact. Provides contact details associated with customer records.
    csks.csvCost Center Master Data. Contains data about cost centers within the organization.
    cskt.csvCost Center Texts. Provides text descriptions and labels for cost centers.
    dd03l.csvData Element Field Labels. Contains labels and descriptions for data fields in the SAP system.
    ekbe.csvPurchase Order History. Details history of purchase orders, including quantities and values.
    ekes.csvPurchasing Document History. Contains history of purchasing documents including changes and statuses.
    eket.csvPurchase Order Item History. Details changes and statuses for individual purchase order items.
    ekkn.csvPurchase Order Account Assignment. Provides account assignment details for purchas...
  4. Reddit

    • redivis.com
    application/jsonl +7
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2021). Reddit [Dataset]. https://redivis.com/datasets/prpw-49sqq9ehv
    Explore at:
    sas, stata, csv, avro, parquet, spss, application/jsonl, arrowAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Apr 12, 2006 - Aug 1, 2019
    Description

    Abstract

    Reddit posts, 2019-01-01 thru 2019-08-01.

    Documentation

    Source: https://console.cloud.google.com/bigquery?p=fh-bigquery&page=project

  5. NYC Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    NYC Open Data
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

    Content

    Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

    • Over 8 million 311 service requests from 2012-2016

    • More than 1 million motor vehicle collisions 2012-present

    • Citi Bike stations and 30 million Citi Bike trips 2013-present

    • Over 1 billion Yellow and Green Taxi rides from 2009-present

    • Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://opendata.cityofnewyork.us/

    https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

    The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

    Banner Photo by @bicadmedia from Unplash.

    Inspiration

    On which New York City streets are you most likely to find a loud party?

    Can you find the Virginia Pines in New York City?

    Where was the only collision caused by an animal that injured a cyclist?

    What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

    https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png

  6. Chicago Narcotics Crime Jan 2016 - Jul 2020

    • kaggle.com
    zip
    Updated Aug 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anugerah Erlaut (2020). Chicago Narcotics Crime Jan 2016 - Jul 2020 [Dataset]. https://www.kaggle.com/aerlaut/chicago-narcotics-jan-2016-jul-2020
    Explore at:
    zip(877003 bytes)Available download formats
    Dataset updated
    Aug 2, 2020
    Authors
    Anugerah Erlaut
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Area covered
    Chicago
    Description

    Introduction

    Chicago is one of America's most iconic cities. It has a colorful history, which rich histories such. Recently, Chicago was also a setting for one of Netflix's popular series : Ozark. The story has it that Chicago is the center for drug distribution for the Navarro cartel.

    So, how true is the series? A quick search on the internet reveals a recently released DEA report on the. The report shows that drug crime exists in Chicago, although they are distributed by the Cartel de Jalisco Nueva Generacion, the Sinaloa Cartel and the Guerros Unidos, to name a few.

    Content

    The government of the City of Chicago has provided a publicly available crime database accessible via Google BigQuery. I have downloaded a subset of the data with crime_type narcotics and year > 2015. The data contains records between 1 Jan 2016 UTC until 23 Jul 2020 UTC.

    The dataset contains these columns : - case_number : ID of the record - date : Date of incident - iucr : Category of the crime, per Illinois Unified Crime Reporting (IUCR) code. [more](https://data.cityofchicago.org/widgets/c7ck-438e) -description: More detailed description of the crime -location_description: Location of the crime -arrest: Whether an arrest was made -domestic: Was the crime domestic? -district: Which district code where the crime happened. [more](https://data.cityofchicago.org/Public-Safety/Boundaries-Police-Districts-current-/fthy-xz3r) -ward: The ward code where the crime happened. [more](https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Wards-2015-/sp34-6z76) -community_area` : The community area code where the crime happened. more

    Acknowledgements

    The data is owned and kindly provided by the City of Chicago.

    Inspiration

    Some questions to get you started:

    1. Is there a trend? Is the crime increasing? or decreasing?
    2. Is there seasonality? Are dealers more like to be out and about in summer? Do they deal inside in winter?
    3. Are some activities more like to happen at certain locations?
    4. We tend to think that more deals happen at night, especially as people wind down, and the surroundings get dark. Does the data reflect that?
    5. Are the incidents clustered to a certain district? Certain type of location?

    Lastly, if you are : - a newly recruited analyst at the DEA / police, what would you recommend? - asked by el jefe del cartel (boss of the cartel) on how to expand operation / operate better, what would you say?

    Happy wrangling!

  7. posts

    • redivis.com
    Updated Oct 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2025). posts [Dataset]. https://redivis.com/datasets/prpw-49sqq9ehv
    Explore at:
    Dataset updated
    Oct 24, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Jan 1, 2019 - Aug 1, 2019
    Description

    The table posts is part of the dataset Reddit, available at https://redivis.com/datasets/prpw-49sqq9ehv. It contains 150795895 rows across 33 variables.

  8. Bitcoin Transactions by Type

    • kaggle.com
    zip
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robson Koji Moriya (2023). Bitcoin Transactions by Type [Dataset]. https://www.kaggle.com/datasets/robsonkoji/bitcoin-transactions-by-type
    Explore at:
    zip(441371 bytes)Available download formats
    Dataset updated
    Mar 28, 2023
    Authors
    Robson Koji Moriya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Transaction Types

    From an end-user perspective, here is a brief overview of the purpose of each transaction type:

    Pay-to-Public-Key-Hash (P2PKH) transaction: This is the most common type of transaction in Bitcoin, where the sender sends bitcoin to the recipient's Bitcoin address. P2PKH transactions are used for everyday transactions, such as buying goods or services.

    Pay-to-Script-Hash (P2SH) transaction: This type of transaction allows for more complex scripts to be used as the receiving address. P2SH transactions are used to enable advanced scripting features, such as multi-sig transactions and time-locked transactions.

    Multi-Signature (Multi-Sig) transaction: This type of transaction requires multiple signatures to authorize a transaction, making it more secure. Multi-sig transactions are used in situations where multiple parties need to approve a transaction, such as for joint accounts or high-value transactions.

    Segregated Witness (SegWit) transaction: This is a type of transaction that separates transaction signature data from the transaction data, reducing the size of the transaction and increasing transaction capacity. SegWit transactions are used to reduce fees and improve transaction speed.

    Lightning Network transaction: This is a layer 2 scaling solution that allows for instant and low-cost transactions by opening a payment channel between two parties. Lightning Network transactions are used for frequent and small-value transactions, such as micropayments and instant payments.

    Types List

    • null: Indicates that the output script is not recognized as a known type.
    • pubkey: Indicates a pay-to-public-key transaction.
    • pubkeyhash: Indicates a pay-to-public-key-hash (P2PKH) transaction.
    • multisig: Indicates a multisignature transaction.
    • nulldata: Indicates a null data transaction.
    • witness_v0_keyhash: Indicates a SegWit transaction using a pay-to-witness-public-key-hash (P2WPKH) script.
    • witness_v0_scripthash: Indicates a SegWit transaction using a pay-to-witness-script-hash (P2WSH) script.
    • witness_unknown: Indicates a SegWit transaction using an unknown script type.
    • scripthash: Indicates a pay-to-script-hash (P2SH) transaction.
    • nonstandard: Indicates a non-standard transaction.

    It's worth noting that this list may not cover every possible transaction type in the Bitcoin network, since there may be variations or new types of output scripts that are not yet recognized or categorized by the outputs.script_type field. Additionally, some complex transactions may use multiple output scripts of different types, which can complicate their categorization.

    Distribution of Transactions Type

    The distribution of transaction types in the Bitcoin/Blockchain ecosystem can vary depending on the period analyzed and the specific data source used. However, here is a general overview of the distribution of transaction types in Bitcoin:

    Regular transactions (Pay-to-Public-Key-Hash or P2PKH transactions) are the most common type of transaction in the Bitcoin network. In some periods, regular transactions account for over 95% of all transactions in the network.

    Pay-to-Script-Hash (P2SH) transactions are the second most common type of transaction, accounting for around 3-4% of transactions.

    Multi-Signature (Multi-Sig) transactions, Segregated Witness (SegWit) transactions, and Lightning Network transactions together account for less than 1% of all transactions in the Bitcoin network.

    It's important to note that the distribution of transaction types can change over time as the Bitcoin network evolves and new features and technologies are introduced. Also, the distribution of transaction types can vary across different blockchain networks other than Bitcoin.

  9. subreddits

    • redivis.com
    Updated Oct 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2025). subreddits [Dataset]. https://redivis.com/datasets/prpw-49sqq9ehv
    Explore at:
    Dataset updated
    Oct 24, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Apr 12, 2006 - Jun 29, 2013
    Description

    The table subreddits is part of the dataset Reddit, available at https://redivis.com/datasets/prpw-49sqq9ehv. It contains 2499 rows across 7 variables.

  10. Customer Activity

    • kaggle.com
    zip
    Updated Nov 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NW Analytics (2022). Customer Activity [Dataset]. https://www.kaggle.com/datasets/nwanalytics/customer-activity/code
    Explore at:
    zip(72684 bytes)Available download formats
    Dataset updated
    Nov 12, 2022
    Authors
    NW Analytics
    Description

    Context

    Assume you are a data analyst in an EdTech company. The company’s customer success team works with an objective to help customers get the maximum value from their product by doing deeper dives into the customer's needs, wants and expectations from the product and helping them reach their goals.

    The customer success team is aiming to achieve sustainable growth by focusing on retaining the existing users.

    Therefore, your team wants to analyze the activity of your existing users and understand their performance, behaviours, and patterns to gain meaningful insights, that help your customer success team take data-informed decisions.

    Expected Outcome

    1. Brainstorm and identify the right metrics and frame proper questions for analysis. Your analysis should help your customer success team to understand.
      • How is the current retention of the users
      • How are they engaging with the content
      • How efficiently are their discussions being resolved
    2. In case you identify any outliers in the data set, make a note of them and exclude them from your analysis.
    3. Build the best suitable dashboard presenting your insights.

    Your recommendations must be backed by meaningful insights and professional visualizations which will help your customer success team design road maps, strategies, and action items to achieve the goal.

    Tools to use:

    1. Google Data Studio (preferred), Tableau, Power Bi or any other visualization tool
    2. You can use BigQuery SQL if you wish, not mandatory

    Overview of the Dataset

    The dataset contains the basic details of the enrolled users, their learning resource completion percentages, activities on the platform and the structure of learning resources available on the platform

    1.**users_basic_details**: Contains basic details of the enrolled users.

    2.**day_wise_user_activity**: Contains the details of the day-wise learning activity of the users. - A user shall have one entry for a lesson in a day.

    3.**learning_resource_details**: Contains the details of learning resources offered to the enrolled users - Content is stored in a hierarchical structure: Track → Course →Topic → Lesson. A lesson can be a video, practice, exam, etc. - Example: Tech Foundations → Developer Foundations → Topic 1 → lesson 1

    4.**feedback_details**: Contains the feedback details/rating given by the user to a particular lesson. - Feedback rating is given on a scale of 1 to 5, 5 being the highest. - A user can give feedback to the same lesson multiple times.

    5.**discussion_details**: Contains the details of the discussions created by the user for a particular lesson.

    6.**discussion_comment_details**: Contains the details of the comments posted for the discussions created by the user. - Comments may be posted by mentors or users themselves. - The role of mentors is to guide and help the users by resolving the doubts and issues faced by them related to their learning activity. - A discussion can have multiple comments.

    Tables Description

    users_basic_details:

    • user_id: unique id of the user [string]
    • gender: gender of the enrolled user [string]
    • current_city: city of residence of the user [string]
    • batch_start_datetime: start datetime of the batch, for which the user is enrolled [datetime]
    • referral_source: referral channel of the user [string]
    • highest_qualification: highest qualification (education details) of the enrolled user [string]

    day_wise_user_activity:

    • activity_datetime: date and time of learning of the user [datetime]
    • user_id: unique id of the user [string]
    • lesson_id: unique id of the lesson [string]
    • lesson_type: type of the lesson. It can be "SESSION", "PRACTICE", "EXAM" or "PROJECT" [string]
    • day_completion_percentage: percent of the lesson completed by the user on a particular day (out of 100%) [float]
      • The completion percentage is calculated by the formula = learnt duration of a lesson on a day/total duration * 100
    • overall_completion_percentage: overall completion percentage of the lesson till date by the user (out of 100%) [float]

      • Example: If a user, who started a lesson on Jan 1, ’22 completes the lesson by learning it in parts (10%, 35%, 37%, 18% each day) on 4 different days, Then
        • Jan 1, ‘22 - day_completion_percentage - 10%, overall_completion_percentage - 10%
        • Jan 3, ‘22 - day_completion_percentage - 35%, overall_completion_percentage - 45%
        • Jan 4, ‘22 - day_completion_percentage - 37%, overall_completion_percentage - 82%
        • Jan 6, ‘22 - day_completion_percentage - 18%, overall_completion_percentage - 100%

    learning_resource_details:

    • track_id: unique id of the track [string]
    • track_title: name of the track [string]
    • course_id: unique id of the course [string]
    • **`...
  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
Organization logo

Looker Ecommerce BigQuery Dataset

CSV version of BigQuery Looker Ecommerce Dataset

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mustafa Keser
Description

Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. distribution_centers.csv

  • Columns:
    • id: Unique identifier for each distribution center.
    • name: Name of the distribution center.
    • latitude: Latitude coordinate of the distribution center.
    • longitude: Longitude coordinate of the distribution center.

2. events.csv

  • Columns:
    • id: Unique identifier for each event.
    • user_id: Identifier for the user associated with the event.
    • sequence_number: Sequence number of the event.
    • session_id: Identifier for the session during which the event occurred.
    • created_at: Timestamp indicating when the event took place.
    • ip_address: IP address from which the event originated.
    • city: City where the event occurred.
    • state: State where the event occurred.
    • postal_code: Postal code of the event location.
    • browser: Web browser used during the event.
    • traffic_source: Source of the traffic leading to the event.
    • uri: Uniform Resource Identifier associated with the event.
    • event_type: Type of event recorded.

3. inventory_items.csv

  • Columns:
    • id: Unique identifier for each inventory item.
    • product_id: Identifier for the associated product.
    • created_at: Timestamp indicating when the inventory item was created.
    • sold_at: Timestamp indicating when the item was sold.
    • cost: Cost of the inventory item.
    • product_category: Category of the associated product.
    • product_name: Name of the associated product.
    • product_brand: Brand of the associated product.
    • product_retail_price: Retail price of the associated product.
    • product_department: Department to which the product belongs.
    • product_sku: Stock Keeping Unit (SKU) of the product.
    • product_distribution_center_id: Identifier for the distribution center associated with the product.

4. order_items.csv

  • Columns:
    • id: Unique identifier for each order item.
    • order_id: Identifier for the associated order.
    • user_id: Identifier for the user who placed the order.
    • product_id: Identifier for the associated product.
    • inventory_item_id: Identifier for the associated inventory item.
    • status: Status of the order item.
    • created_at: Timestamp indicating when the order item was created.
    • shipped_at: Timestamp indicating when the order item was shipped.
    • delivered_at: Timestamp indicating when the order item was delivered.
    • returned_at: Timestamp indicating when the order item was returned.

5. orders.csv

  • Columns:
    • order_id: Unique identifier for each order.
    • user_id: Identifier for the user who placed the order.
    • status: Status of the order.
    • gender: Gender information of the user.
    • created_at: Timestamp indicating when the order was created.
    • returned_at: Timestamp indicating when the order was returned.
    • shipped_at: Timestamp indicating when the order was shipped.
    • delivered_at: Timestamp indicating when the order was delivered.
    • num_of_item: Number of items in the order.

6. products.csv

  • Columns:
    • id: Unique identifier for each product.
    • cost: Cost of the product.
    • category: Category to which the product belongs.
    • name: Name of the product.
    • brand: Brand of the product.
    • retail_price: Retail price of the product.
    • department: Department to which the product belongs.
    • sku: Stock Keeping Unit (SKU) of the product.
    • distribution_center_id: Identifier for the distribution center associated with the product.

7. users.csv

  • Columns:
    • id: Unique identifier for each user.
    • first_name: First name of the user.
    • last_name: Last name of the user.
    • email: Email address of the user.
    • age: Age of the user.
    • gender: Gender of the user.
    • state: State where t...
Search
Clear search
Close search
Google apps
Main menu