52 datasets found
  1. SAP DATASET | BigQuery Dataset

    • kaggle.com
    zip
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). SAP DATASET | BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/sap-dataset-bigquery-dataset/discussion
    Explore at:
    zip(365940125 bytes)Available download formats
    Dataset updated
    Aug 20, 2024
    Authors
    Mustafa Keser
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Certainly! Here's a description for the Kaggle dataset related to the cloud-training-demos.SAP_REPLICATED_DATA BigQuery public dataset:

    Dataset Description: SAP Replicated Data

    Dataset ID: cloud-training-demos.SAP_REPLICATED_DATA

    Overview: The SAP_REPLICATED_DATA dataset in BigQuery provides a comprehensive replication of SAP (Systems, Applications, and Products in Data Processing) business data. This dataset is designed to support data analytics and machine learning tasks by offering a rich set of structured data that mimics real-world enterprise scenarios. It includes data from various SAP modules and processes, enabling users to perform in-depth analysis, build predictive models, and explore business insights.

    Content: - Tables and Schemas: The dataset consists of multiple tables representing different aspects of SAP business operations, including but not limited to sales, inventory, finance, and procurement data. - Data Types: It contains structured data with fields such as transaction IDs, timestamps, customer details, product information, sales figures, and financial metrics. - Data Volume: The dataset is designed to simulate large-scale enterprise data, making it suitable for performance testing, data processing, and analysis.

    Usage: - Business Analytics: Users can analyze business trends, sales performance, and financial metrics. - Machine Learning: Ideal for developing and testing machine learning models related to business forecasting, anomaly detection, and customer segmentation. - Data Processing: Suitable for practicing SQL queries, data transformation, and integration tasks.

    Example Use Cases: - Sales Analysis: Track and analyze sales performance across different regions and time periods. - Inventory Management: Monitor inventory levels and identify trends in stock movements. - Financial Reporting: Generate financial reports and analyze expense patterns.

    For more information and to access the dataset, visit the BigQuery public datasets page or refer to the dataset documentation in the BigQuery console.

    Tables:

    Here's a Markdown table with the information you provided:

    File NameDescription
    adr6.csvAddresses with organizational units. Contains address details related to organizational units like departments or branches.
    adrc.csvGeneral Address Data. Provides information about addresses, including details such as street, city, and postal codes.
    adrct.csvAddress Contact Information. Contains contact information linked to addresses, including phone numbers and email addresses.
    adrt.csvAddress Details. Includes detailed address data such as street addresses, city, and country codes.
    ankt.csvAccounting Document Segment. Provides details on segments within accounting documents, including account numbers and amounts.
    anla.csvAsset Master Data. Contains information about fixed assets, including asset identification and classification.
    bkpf.csvAccounting Document Header. Contains headers of accounting documents, such as document numbers and fiscal year.
    bseg.csvAccounting Document Segment. Details line items within accounting documents, including account details and amounts.
    but000.csvBusiness Partners. Contains basic information about business partners, including IDs and names.
    but020.csvBusiness Partner Addresses. Provides address details associated with business partners.
    cepc.csvCustomer Master Data - Central. Contains centralized data for customer master records.
    cepct.csvCustomer Master Data - Contact. Provides contact details associated with customer records.
    csks.csvCost Center Master Data. Contains data about cost centers within the organization.
    cskt.csvCost Center Texts. Provides text descriptions and labels for cost centers.
    dd03l.csvData Element Field Labels. Contains labels and descriptions for data fields in the SAP system.
    ekbe.csvPurchase Order History. Details history of purchase orders, including quantities and values.
    ekes.csvPurchasing Document History. Contains history of purchasing documents including changes and statuses.
    eket.csvPurchase Order Item History. Details changes and statuses for individual purchase order items.
    ekkn.csvPurchase Order Account Assignment. Provides account assignment details for purchas...
  2. Ecommerce_bigQuery

    • kaggle.com
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chirag Givan (2024). Ecommerce_bigQuery [Dataset]. https://www.kaggle.com/datasets/chiraggivan82/ecommerce-bigquery
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chirag Givan
    Description

    About this Dataset

    Ecommerce data is typically proprietary and not shared by private companies. However, this dataset is sourced from Google Cloud's BigQuery public data. It comes from the "thelook_ecommerce" dataset, which consists of seven tables.

    Content

    This dataset contains transactional data spanning from 2019 to 2024, capturing all global consumer transactions. The company primarily sells a wide range of products, including clothing and accessories, catering to all age groups. The majority of its customers are based in the USA, China, and Brazil.

    Table Creation

    An additional data table was created from the Events table to track user sessions where a purchase was completed within the same session. This table includes details such as the date and time of the user's first interaction with the webpage, recorded as sequence number 1, as well as the date and time of the final purchase event, along with the corresponding sequence number for that session id.

  3. Google Ads Transparency Center

    • console.cloud.google.com
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
    Explore at:
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Description

    This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  4. h

    github_meta

    • huggingface.co
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepGit (2024). github_meta [Dataset]. https://huggingface.co/datasets/deepgit/github_meta
    Explore at:
    Dataset updated
    Aug 9, 2024
    Dataset authored and provided by
    DeepGit
    License

    https://choosealicense.com/licenses/osl-3.0/https://choosealicense.com/licenses/osl-3.0/

    Description

    Process to Generate DuckDB Dataset

      1. Load Repository Metadata
    

    Read repo_metadata.json from GitHub Public Repository Metadata Normalize JSON into three lists: Repositories → general metadata (stars, forks, license, etc.). Languages → repo-language mappings with size. Topics → repo-topic mappings.

    Convert lists into Pandas DataFrames: df_repos, df_languages, df_topics.

      2. Enhance with BigQuery Data
    

    Create a temporary BigQuery table (repo_list)… See the full description on the dataset page: https://huggingface.co/datasets/deepgit/github_meta.

  5. OpenStreetMap Public Dataset

    • console.cloud.google.com
    Updated Apr 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:OpenStreetMap&hl=de (2023). OpenStreetMap Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/openstreetmap/geo-openstreetmap?hl=de
    Explore at:
    Dataset updated
    Apr 23, 2023
    Dataset provided by
    OpenStreetMap//www.openstreetmap.org/
    Googlehttp://google.com/
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Adapted from Wikipedia: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Created in 2004, it was inspired by the success of Wikipedia and more than two million registered users who can add data by manual survey, GPS devices, aerial photography, and other free sources. We've made available a number of tables (explained in detail below): history_* tables: full history of OSM objects planet_* tables: snapshot of current OSM objects as of Nov 2019 The history_* and planet_* table groups are composed of node, way, relation, and changeset tables. These contain the primary OSM data types and an additional changeset corresponding to OSM edits for convenient access. These objects are encoded using the BigQuery GEOGRAPHY data type so that they can be operated upon with the built-in geography functions to perform geometry and feature selection, additional processing. Example analyses are given below. This dataset is part of a larger effort to make data available in BigQuery through the Google Cloud Public Datasets program . OSM itself is produced as a public good by volunteers, and there are no guarantees about data quality. Interested in learning more about how these data were brought into BigQuery and how you can use them? Check out the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  6. Google Trends

    • console.cloud.google.com
    Updated Jun 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ES (2022). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=ES
    Explore at:
    Dataset updated
    Jun 11, 2022
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Google Searchhttp://google.com/
    Googlehttp://google.com/
    Description

    The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  7. Stack Overflow Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stack Overflow (2019). Stack Overflow Data [Dataset]. https://www.kaggle.com/datasets/stackoverflow/stackoverflow
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    Stack Overflowhttp://stackoverflow.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Context

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.

    Content

    Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    Dataset Source: https://archive.org/download/stackexchange

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow

    https://cloud.google.com/bigquery/public-data/stackoverflow

    Banner Photo by Caspar Rubin from Unplash.

    Inspiration

    What is the percentage of questions that have been answered over the years?

    What is the reputation and badge count of users across different tenures on StackOverflow?

    What are 10 of the “easier” gold badges to earn?

    Which day of the week has most questions answered within an hour?

  8. COVID-19 Search Trends symptoms dataset

    • console.cloud.google.com
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=fr (2023). COVID-19 Search Trends symptoms dataset [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-search-trends?hl=fr
    Explore at:
    Dataset updated
    Jun 22, 2023
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Google Searchhttp://google.com/
    Googlehttp://google.com/
    Description

    The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for a broad set of health symptoms, signs, and conditions. The dataset provides a daily or weekly time series for each region showing the relative volume of searches for each symptom. This dataset is intended to help researchers to better understand the impact of COVID-19. It shouldn't be used for medical diagnostic, prognostic, or treatment purposes. It also isn't intended to be used for guidance on personal travel plans. To learn more about the dataset, how we generate it and preserve privacy, read the data documentation . To visualize the data, try exploring these interactive charts and map of symptom search trends . As of Dec. 15, 2020, the dataset was expanded to include trends for Australia, Ireland, New Zealand, Singapore, and the United Kingdom. This expanded data is available in new tables that provide data at country and two subregional levels. We will not be updating existing state/county tables going forward. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  9. Z

    (No) Influence of Continuous Integration on the Development Activity in...

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baltes, Sebastian; Knack, Jascha (2020). (No) Influence of Continuous Integration on the Development Activity in GitHub Projects — Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1140260
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    University of Trier
    Authors
    Baltes, Sebastian; Knack, Jascha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is based on the TravisTorrent dataset released 2017-01-11 (https://travistorrent.testroots.org), the Google BigQuery GHTorrent dataset accessed 2017-07-03, and the Git log history of all projects in the dataset, retrieved 2017-07-16 and 2017-07-17.

    We selected projects hosted on GitHub that employ the Continuous Integration (CI) system Travis CI. We identified the projects using the TravisTorrent data set and considered projects that:

    used GitHub from the beginning (first commit not more than seven days before project creation date according to GHTorrent),

    were active for at least one year (365 days) before the first build with Travis CI (before_ci),

    used Travis CI at least for one year (during_ci),

    had commit or merge activity on the default branch in both of these phases, and

    used the default branch to trigger builds.

    To derive the time frames, we employed the GHTorrent Big Query data set. The resulting sample contains 113 projects. Of these projects, 89 are Ruby projects and 24 are Java projects. For our analysis, we only consider the activity one year before and after the first build.

    We cloned the selected project repositories and extracted the version history for all branches (see https://github.com/sbaltes/git-log-parser). For each repo and branch, we created one log file with all regular commits and one log file with all merges. We only considered commits changing non-binary files and applied a file extension filter to only consider changes to Java or Ruby source code files. From the log files, we then extracted metadata about the commits and stored this data in CSV files (see https://github.com/sbaltes/git-log-parser).

    We also retrieved a random sample of GitHub project to validate the effects we observed in the CI project sample. We only considered projects that:

    have Java or Ruby as their project language

    used GitHub from the beginning (first commit not more than seven days before project creation date according to GHTorrent)

    have commit activity for at least two years (730 days)

    are engineered software projects (at least 10 watchers)

    were not in the TravisTorrent dataset

    In total, 8,046 projects satisfied those constraints. We drew a random sample of 800 projects from this sampling frame and retrieved the commit and merge data in the same way as for the CI sample. We then split the development activity at the median development date, removed projects without commits or merges in either of the two resulting time spans, and then manually checked the remaining projects to remove the ones with CI configuration files. The final comparision sample contained 60 non-CI projects.

    This dataset contains the following files:

    tr_projects_sample_filtered_2.csv A CSV file with information about the 113 selected projects.

    tr_sample_commits_default_branch_before_ci.csv tr_sample_commits_default_branch_during_ci.csv One CSV file with information about all commits to the default branch before and after the first CI build. Only commits modifying, adding, or deleting Java or Ruby source code files were considered. Those CSV files have the following columns:

    project: GitHub project name ("/" replaced by "_"). branch: The branch to which the commit was made. hash_value: The SHA1 hash value of the commit. author_name: The author name. author_email: The author email address. author_date: The authoring timestamp. commit_name: The committer name. commit_email: The committer email address. commit_date: The commit timestamp. log_message_length: The length of the git commit messages (in characters). file_count: Files changed with this commit. lines_added: Lines added to all files changed with this commit. lines_deleted: Lines deleted in all files changed with this commit. file_extensions: Distinct file extensions of files changed with this commit.

    tr_sample_merges_default_branch_before_ci.csv tr_sample_merges_default_branch_during_ci.csv One CSV file with information about all merges into the default branch before and after the first CI build. Only merges modifying, adding, or deleting Java or Ruby source code files were considered. Those CSV files have the following columns:

    project: GitHub project name ("/" replaced by "_"). branch: The destination branch of the merge. hash_value: The SHA1 hash value of the merge commit. merged_commits: Unique hash value prefixes of the commits merged with this commit. author_name: The author name. author_email: The author email address. author_date: The authoring timestamp. commit_name: The committer name. commit_email: The committer email address. commit_date: The commit timestamp. log_message_length: The length of the git commit messages (in characters). file_count: Files changed with this commit. lines_added: Lines added to all files changed with this commit. lines_deleted: Lines deleted in all files changed with this commit. file_extensions: Distinct file extensions of files changed with this commit. pull_request_id: ID of the GitHub pull request that has been merged with this commit (extracted from log message). source_user: GitHub login name of the user who initiated the pull request (extracted from log message). source_branch : Source branch of the pull request (extracted from log message).

    comparison_project_sample_800.csv A CSV file with information about the 800 projects in the comparison sample.

    commits_default_branch_before_mid.csv commits_default_branch_after_mid.csv One CSV file with information about all commits to the default branch before and after the medium date of the commit history. Only commits modifying, adding, or deleting Java or Ruby source code files were considered. Those CSV files have the same columns as the commits tables described above.

    merges_default_branch_before_mid.csv merges_default_branch_after_mid.csv One CSV file with information about all merges into the default branch before and after the medium date of the commit history. Only merges modifying, adding, or deleting Java or Ruby source code files were considered. Those CSV files have the same columns as the merge tables described above.

  10. Libraries.io Data

    • console.cloud.google.com
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Libraries.io&hl=zh-CN (2023). Libraries.io Data [Dataset]. https://console.cloud.google.com/marketplace/product/libraries-io/librariesio?hl=zh-CN
    Explore at:
    Dataset updated
    Aug 14, 2023
    Dataset provided by
    Libraries.iohttps://libraries.io/
    Googlehttp://google.com/
    Description

    Libraries.io gathers data on open source software from 33 package managers and 3 source code repositories. We track over 2.4m unique open source projects, 25m repositories and 121m interdependencies between them. This gives Libraries.io a unique understanding of open source software. In this release you will find data about software distributed and/or crafted publicly on the Internet. You will find information about its development, its distribution and its relationship with other software included as a dependency. You will not find any information about the individuals who create and maintain these projects. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  11. h

    stackexchange

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Albert Gong, stackexchange [Dataset]. https://huggingface.co/datasets/ag2435/stackexchange
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Albert Gong
    Description

    StackExchange Dataset

    Working doc: https://docs.google.com/document/d/1h585bH5sYcQW4pkHzqWyQqA4ape2Bq6o1Cya0TkMOQc/edit?usp=sharing

    BigQuery query (see so_bigquery.ipynb): CREATE TEMP TABLE answers AS SELECT * FROM bigquery-public-data.stackoverflow.posts_answers WHERE LOWER(Body) LIKE '%arxiv%';

    CREATE TEMPORARY TABLE questions AS SELECT * FROM bigquery-public-data.stackoverflow.posts_questions;

    SELECT * FROM answers JOIN questions ON questions.id = answers.parent_id;

    NOTE:… See the full description on the dataset page: https://huggingface.co/datasets/ag2435/stackexchange.

  12. COVID-19 Open Data

    • console.cloud.google.com
    Updated Jun 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=fr (2023). COVID-19 Open Data [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-open-data?hl=fr
    Explore at:
    Dataset updated
    Jun 22, 2023
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This repository contains the largest COVID-19 epidemiological database available in addition to a powerful set of expansive covariates. It includes open sourced data with a permissive license (enabling commercial use) relating to vaccinations, epidemiology, hospitalizations, demographics, economy, geography, health, mobility, government response, weather, and more. Moreover, the data merges daily time-series from hundreds of data sources at a fine spatial resolution, containing over 20,000 locations and using a consistent set of region keys. This dataset is available in both the US and EU regions of BigQuery at the following links: COVID-19 Open Data: US Region COVID-19 Open Data: EU Region All data in this dataset is retrieved automatically. When possible, data is retrieved directly from the relevant authorities, like a country's ministry of health. This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  13. BigQuery sample Data set

    • kaggle.com
    zip
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ritu Barik (2024). BigQuery sample Data set [Dataset]. https://www.kaggle.com/ritubarik/bigquery-sample-data-set
    Explore at:
    zip(565 bytes)Available download formats
    Dataset updated
    Nov 11, 2024
    Authors
    Ritu Barik
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Ritu Barik

    Released under Apache 2.0

    Contents

  14. Z

    Global AIS-based Apparent Fishing Effort Dataset

    • data-staging.niaid.nih.gov
    • zenodo.org
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Global Fishing Watch (2025). Global AIS-based Apparent Fishing Effort Dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14982711
    Explore at:
    Dataset updated
    Mar 11, 2025
    Authors
    Global Fishing Watch
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset contains version 3.0 (March 2025 release) of the Global Fishing Watch apparent fishing effort dataset. Data is available for 2012-2024 and based on positions of >190,000 unique automatic identification system (AIS) devices on fishing vessels, of which up to ~96,000 are active in a given year. Fishing vessels are identified via a machine learning model, vessel registry databases, and manual review by GFW and regional experts. Vessel time is measured in hours, calculated by assigning to each AIS position the amount of time elapsed since the previous AIS position of the vessel. The time is counted as apparent fishing hours if the GFW fishing detection model - a neural network machine learning model - determines the vessel is engaged in fishing behavior during that AIS position.

    Data are spatially binned into grid cells that measure 0.01 or 0.1 degrees on a side; the coordinates defining each cell are provided in decimal degrees (WGS84) and correspond to the lower-left corner. Data are available in the following formats:

    Daily apparent fishing hours by flag state and gear type at 100th degree resolution

    Monthly apparent fishing hours by flag state and gear type at 10th degree resolution

    Daily apparent fishing hours by MMSI at 10th degree resolution

    The fishing effort dataset is accompanied by a table of vessel information (e.g. gear type, flag state, dimensions).

    File structure

    Fishing effort and vessel presence data are available as .csv files in daily formats. Files for each year are stored in separate .zip files. A README.txt and schema.json file is provided for each dataset version and contains the table schema and additional information. There is also a README-known-issues-v3.txt file outlining some of the known issues with the version 3 release.

    Files are names according to the following convention:

    Daily file format:

    [fleet/mmsi]-daily-csvs-[100/10]-v3-[year].zip

    [fleet/mmsi]-daily-csvs-[100/10]-v3-[date].csv

    Monthly file format:

    fleet-monthly-csvs-10-v3-[year].zip

    fleet-monthly-csvs-10-v3-[date].csv

    Fishing vessel format: fishing-vessels-v3.csv

    README file format: README-[fleet/mmsi/fishing-vessels/known-issues]-v3.txt

    File identifiers:

    [fleet/mmsi]: Data by fleet (flag and geartype) or by MMSI

    [100/10]: 100th or 10th degree resolution

    [year]: Year of data included in .zip file

    [date]: Date of data included in .csv files. For monthly data, [date]corresponds to the first date of the month

    Examples: fleet-daily-csvs-100-v3-2020.zip; mmsi-daily-csvs-10-v3-2020-01-10.csv; fishing-vessels-v3.csv; README-fleet-v3.txt; fleet-monthly-csvs-10-v3-2024.zip; fleet-monthly-csvs-10-v3-2024-08-01.csv

    Key documentation

    For an overview of how GFW turns raw AIS positions into estimates of fishing hours, see this page.

    The models used to produce this dataset were developed as part of this publication: D.A. Kroodsma, J. Mayorga, T. Hochberg, N.A. Miller, K. Boerder, F. Ferretti, A. Wilson, B. Bergman, T.D. White, B.A. Block, P. Woods, B. Sullivan, C. Costello, and B. Worm. "Tracking the global footprint of fisheries." Science 361.6378 (2018). Model details are available in the Supplementary Materials.

    The README-known-issues-v3.txt file describing this dataset's specific caveats can be downloaded from this page. We highly recommend that users read this file in full.

    The README-mmsi-v3.txt file, the README-fleet-v3.txt file, and the README-fishing-vessels-v3.txt files are downloadable from this page and contain the data description for (respectively) the fishing hours by MMSI dataset, the fishing hours by fleet dataset, and the vessel information file. These readmes contain key explanations about the gear types and flag states assigned to vessels in the dataset.

    File name structure for the datafiles are available below on this page and file schema can be downloaded from this page.

    A FAQ describing the updates in this version and the differences between this dataset and the data available from the GFW Map and APIs is available here.

    Use Cases

    The apparent fishing hours dataset is intended to allow users to analyze patterns of fishing across the world’s oceans at temporal scales as fine as daily and at spatial scales as fine as 0.1 or 0.01 degree cells. Fishing hours can be separated out by gear type, vessel flag and other characteristics of vessels such as tonnage.

    Potential applications for this dataset are broad. We offer suggested use cases to illustrate its utility. The dataset can be integrated as a static layer in multi-layered analyses, allowing researchers to investigate relationships between fishing effort and other variables, including biodiversity, tracking, and environmental data, as defined by their research objectives.

    A few example questions that these data could be used to answer:

    What flag states have fishing activity in my area of interest?

    Do hotspots of longline fishing overlap with known migration routes of sea turtles?

    How does fishing time by trawlers change by month in my area of interest? Which seasons see the most trawling hours and which see the least?

    Caveats

    This global dataset estimates apparent fishing hours effort. The dataset is based on publicly available information and statistical classifications which may not fully capture the nuances of local fishing practices. While we manually review the dataset at a global scale and in a select set of smaller test regions to check for issues, given the scale of the dataset we are unable to manually review every fleet in every region. We recognize the potential for inaccuracies and encourage users to approach regional analyses with caution, utilizing their own regional expertise to validate findings. We welcome your feedback on any regional analysis at research@globalfishingwatch.org to enhance the dataset's accuracy.

    Caveats relating to known sources of inaccuracy as well as interpretation pitfalls to avoid are described in the README-known-issues-v3.txt file available for download from this page. We highly recommend that users read this file in full. The issues described include:

    Data from 2024 should be considered provisional, as vessel classifications may change as more data from 2025 becomes available.

    MMSI is used in this dataset as the vessel identifier. While MMSI is intended to serve as the unique AIS identifier for an individual vessel, this does not always hold in practice.

    The Maritime Identification Digits (MID), the first 3 digits of MMSI, are the only source of information on vessel flag state when the vessel does not appear on a registry. The MID may be entered incorrectly, obscuring information about an MMSI’s flag state.

    AIS reception is not consistent across all areas and changes over time.

    Alternative ways to access

    Query using SQL in the Global Fishing Watch public BigQuery dataset: global-fishing-watch.fishing_effort_v3

    Download the entire dataset from the Global Fishing Watch Data Download Portal (https://globalfishingwatch.org/data-download/datasets/public-fishing-effort)

  15. h

    codeparrot

    • huggingface.co
    Updated Sep 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Language Processing with Transformers (2021). codeparrot [Dataset]. https://huggingface.co/datasets/transformersbook/codeparrot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2021
    Dataset authored and provided by
    Natural Language Processing with Transformers
    Description

    CodeParrot 🦜 Dataset

      What is it?
    

    This is the full CodeParrot dataset. It contains Python files used to train the code generation model in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.

      Creation
    

    It was created with the GitHub dataset available via Google's BigQuery. It contains approximately 22 million Python files and is 180 GB (50 GB compressed) big. The… See the full description on the dataset page: https://huggingface.co/datasets/transformersbook/codeparrot.

  16. NYC Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    NYC Open Data
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

    Content

    Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

    • Over 8 million 311 service requests from 2012-2016

    • More than 1 million motor vehicle collisions 2012-present

    • Citi Bike stations and 30 million Citi Bike trips 2013-present

    • Over 1 billion Yellow and Green Taxi rides from 2009-present

    • Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://opendata.cityofnewyork.us/

    https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

    The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

    Banner Photo by @bicadmedia from Unplash.

    Inspiration

    On which New York City streets are you most likely to find a loud party?

    Can you find the Virginia Pines in New York City?

    Where was the only collision caused by an animal that injured a cyclist?

    What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

    https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png

  17. CMS Synthetic Patient Data OMOP

    • redivis.com
    application/jsonl +7
    Updated Aug 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2020). CMS Synthetic Patient Data OMOP [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
    Explore at:
    sas, avro, parquet, stata, application/jsonl, arrow, csv, spssAvailable download formats
    Dataset updated
    Aug 19, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Jan 1, 2008 - Dec 31, 2010
    Description

    Abstract

    This is a synthetic patient dataset in the OMOP Common Data Model v5.2, originally released by the CMS and accessed via BigQuery. The dataset includes 24 tables and records for 2 million synthetic patients from 2008 to 2010.

    Methodology

    This dataset takes on the format of the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). As shown in the diagram below, the purpose of the Common Data Model is to convert various distinctly-formatted datasets into a well-known, universal format with a set of standardized vocabularies. See the diagram below from the Observational Health Data Sciences and Informatics (OHDSI) webpage.

    https://redivis.com/fileUploads/d1a95a4e-074a-44d1-92e5-9adfd2f4068a%3E" alt="Why-CDM.png">

    Such universal data models ultimately enable researchers to streamline the analysis of observational medical data. For more information regarding the OMOP CDM, refer to the OHSDI OMOP site.

    Usage

    %3Cli%3EFor documentation regarding the source data format from the Center for Medicare and Medicaid Services (CMS), refer to the %3Ca href="https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF"%3ECMS Synthetic Public Use File%3C/a%3E.%3C/li%3E

    %3Cli%3EFor information regarding the conversion of the CMS data file to the OMOP CDM v5.2, refer to %3Ca href="https://github.com/OHDSI/ETL-CMS"%3Ethis OHDSI GitHub page%3C/a%3E. %3C/li%3E

    %3Cli%3EFor information regarding each of the 24 tables in this dataset, including more detailed variable metadata, see %3Ca href="https://github.com/OHDSI/CommonDataModel/wiki"%3Ethe OHDSI CDM GitHub Wiki page%3C/a%3E. All variable labels and descriptions as well as table descriptions come from this Wiki page. Note that this GitHub page includes information primarily regarding the 6.0 version of the CDM and that this dataset works with the 5.2 version. %3C/li%3E

  18. Data from: Mining Rule Violations in JavaScript Code Snippets

    • zenodo.org
    csv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uriel Ferreira Campos; Guilherme Smethurst; João Pedro Moraes; Rodrigo Bonifácio; Gustavo Pinto; Uriel Ferreira Campos; Guilherme Smethurst; João Pedro Moraes; Rodrigo Bonifácio; Gustavo Pinto (2020). Mining Rule Violations in JavaScript Code Snippets [Dataset]. http://doi.org/10.5281/zenodo.2593818
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Uriel Ferreira Campos; Guilherme Smethurst; João Pedro Moraes; Rodrigo Bonifácio; Gustavo Pinto; Uriel Ferreira Campos; Guilherme Smethurst; João Pedro Moraes; Rodrigo Bonifácio; Gustavo Pinto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Content of this repository
    This is the repository that contains the scripts and dataset for the MSR 2019 mining challenge

    Github Repository with the software used : here.
    =======

    DATASET
    The dataset was retrived utilizing google bigquery and dumped to a csv
    file for further processing, this original file with no treatment is called jsanswers.csv, here we can find the following information :
    1. The Id of the question (PostId)
    2. The Content (in this case the code block)
    3. the lenght of the code block
    4. the line count of the code block
    5. The score of the post
    6. The title

    A quick look at this files, one can notice that a postID can have multiple rows related to it, that's how multiple codeblocks are saved in the database.

    Filtered Dataset:

    Extracting code from CSV
    We used a python script called "ExtractCodeFromCSV.py" to extract the code from the original csv and merge all the codeblocks in their respective javascript file with the postID as name, this resulted in 336 thousand files.

    Running ESlint
    Due to the single threaded nature of ESlint, we needed to create a script to run ESlint because it took a huge toll on the machine to run it on 336 thousand files, this script is named "ESlintRunnerScript.py", it splits the files in 20 evenly distributed parts and runs 20 processes of esLinter to generate the reports, as such it generates 20 json files.

    Number of Violations per Rule
    This information was extracted using the script named "parser.py", it generated the file named "NumberofViolationsPerRule.csv" which contains the number of violations per rule used in the linter configuration in the dataset.

    Number of violations per Category
    As a way to make relevant statistics of the dataset, we generated the number of violations per rule category as defined in the eslinter website, this information was extracted using the same "parser.py" script.

    Individual Reports
    This information was extracted from the json reports, it's a csv file with PostID and violations per rule.

    Rules
    The file Rules with categories contains all the rules used and their categories.

  19. Ethereum Blockchain

    • kaggle.com
    zip
    Updated Mar 4, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Ethereum Blockchain [Dataset]. https://www.kaggle.com/datasets/bigquery/ethereum-blockchain
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 4, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Bitcoin and other cryptocurrencies have captured the imagination of technologists, financiers, and economists. Digital currencies are only one application of the underlying blockchain technology. Like its predecessor, Bitcoin, the Ethereum blockchain can be described as an immutable distributed ledger. However, creator Vitalik Buterin also extended the set of capabilities by including a virtual machine that can execute arbitrary code stored on the blockchain as smart contracts.

    Both Bitcoin and Ethereum are essentially OLTP databases, and provide little in the way of OLAP (analytics) functionality. However the Ethereum dataset is notably distinct from the Bitcoin dataset:

    • The Ethereum blockchain has as its primary unit of value Ether, while the Bitcoin blockchain has Bitcoin. However, the majority of value transfer on the Ethereum blockchain is composed of so-called tokens. Tokens are created and managed by smart contracts.

    • Ether value transfers are precise and direct, resembling accounting ledger debits and credits. This is in contrast to the Bitcoin value transfer mechanism, for which it can be difficult to determine the balance of a given wallet address.

    • Addresses can be not only wallets that hold balances, but can also contain smart contract bytecode that allows the programmatic creation of agreements and automatic triggering of their execution. An aggregate of coordinated smart contracts could be used to build a decentralized autonomous organization.

    Content

    The Ethereum blockchain data are now available for exploration with BigQuery. All historical data are in the ethereum_blockchain dataset, which updates daily.

    Our hope is that by making the data on public blockchain systems more readily available it promotes technological innovation and increases societal benefits.

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_ethereum.[TABLENAME]. Fork this kernel to get started.

    Acknowledgements

    Cover photo by Thought Catalog on Unsplash

    Inspiration

    • What are the most popularly exchanged digital tokens, represented by ERC-721 and ERC-20 smart contracts?
    • Compare transaction volume and transaction networks over time
    • Compare transaction volume to historical prices by joining with other available data sources like Bitcoin Historical Data
  20. ListenBrainz

    • console.cloud.google.com
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:MetaBrainz&hl=it (2023). ListenBrainz [Dataset]. https://console.cloud.google.com/marketplace/product/metabrainz/listenbrainz?hl=it
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    MetaBrainz Foundationhttp://metabrainz.org/
    Googlehttp://google.com/
    Description

    The ListenBrainz data set is time series music metadata for users who report the music to which they have recently listened. Artist, album, recording, listened at timestamp and ancillary data that is submitted to the ListenBrainz server is streamed live to BigQuery and should appear within a few seconds of submission to ListenBrainz. The data we gather and publish via BigQuery is defined in our API documentation. This data is a valuable resource for discovering users music listening habits and should be a valuable source for creating music recommendation systems. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mustafa Keser (2024). SAP DATASET | BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/sap-dataset-bigquery-dataset/discussion
Organization logo

SAP DATASET | BigQuery Dataset

BigQuery SAP Dataset | cloud-training-demos.SAP_REPLICATED_DATA

Explore at:
zip(365940125 bytes)Available download formats
Dataset updated
Aug 20, 2024
Authors
Mustafa Keser
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Certainly! Here's a description for the Kaggle dataset related to the cloud-training-demos.SAP_REPLICATED_DATA BigQuery public dataset:

Dataset Description: SAP Replicated Data

Dataset ID: cloud-training-demos.SAP_REPLICATED_DATA

Overview: The SAP_REPLICATED_DATA dataset in BigQuery provides a comprehensive replication of SAP (Systems, Applications, and Products in Data Processing) business data. This dataset is designed to support data analytics and machine learning tasks by offering a rich set of structured data that mimics real-world enterprise scenarios. It includes data from various SAP modules and processes, enabling users to perform in-depth analysis, build predictive models, and explore business insights.

Content: - Tables and Schemas: The dataset consists of multiple tables representing different aspects of SAP business operations, including but not limited to sales, inventory, finance, and procurement data. - Data Types: It contains structured data with fields such as transaction IDs, timestamps, customer details, product information, sales figures, and financial metrics. - Data Volume: The dataset is designed to simulate large-scale enterprise data, making it suitable for performance testing, data processing, and analysis.

Usage: - Business Analytics: Users can analyze business trends, sales performance, and financial metrics. - Machine Learning: Ideal for developing and testing machine learning models related to business forecasting, anomaly detection, and customer segmentation. - Data Processing: Suitable for practicing SQL queries, data transformation, and integration tasks.

Example Use Cases: - Sales Analysis: Track and analyze sales performance across different regions and time periods. - Inventory Management: Monitor inventory levels and identify trends in stock movements. - Financial Reporting: Generate financial reports and analyze expense patterns.

For more information and to access the dataset, visit the BigQuery public datasets page or refer to the dataset documentation in the BigQuery console.

Tables:

Here's a Markdown table with the information you provided:

File NameDescription
adr6.csvAddresses with organizational units. Contains address details related to organizational units like departments or branches.
adrc.csvGeneral Address Data. Provides information about addresses, including details such as street, city, and postal codes.
adrct.csvAddress Contact Information. Contains contact information linked to addresses, including phone numbers and email addresses.
adrt.csvAddress Details. Includes detailed address data such as street addresses, city, and country codes.
ankt.csvAccounting Document Segment. Provides details on segments within accounting documents, including account numbers and amounts.
anla.csvAsset Master Data. Contains information about fixed assets, including asset identification and classification.
bkpf.csvAccounting Document Header. Contains headers of accounting documents, such as document numbers and fiscal year.
bseg.csvAccounting Document Segment. Details line items within accounting documents, including account details and amounts.
but000.csvBusiness Partners. Contains basic information about business partners, including IDs and names.
but020.csvBusiness Partner Addresses. Provides address details associated with business partners.
cepc.csvCustomer Master Data - Central. Contains centralized data for customer master records.
cepct.csvCustomer Master Data - Contact. Provides contact details associated with customer records.
csks.csvCost Center Master Data. Contains data about cost centers within the organization.
cskt.csvCost Center Texts. Provides text descriptions and labels for cost centers.
dd03l.csvData Element Field Labels. Contains labels and descriptions for data fields in the SAP system.
ekbe.csvPurchase Order History. Details history of purchase orders, including quantities and values.
ekes.csvPurchasing Document History. Contains history of purchasing documents including changes and statuses.
eket.csvPurchase Order Item History. Details changes and statuses for individual purchase order items.
ekkn.csvPurchase Order Account Assignment. Provides account assignment details for purchas...
Search
Clear search
Close search
Google apps
Main menu