Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Certainly! Here's a description for the Kaggle dataset related to the cloud-training-demos.SAP_REPLICATED_DATA BigQuery public dataset:
Dataset ID: cloud-training-demos.SAP_REPLICATED_DATA
Overview:
The SAP_REPLICATED_DATA dataset in BigQuery provides a comprehensive replication of SAP (Systems, Applications, and Products in Data Processing) business data. This dataset is designed to support data analytics and machine learning tasks by offering a rich set of structured data that mimics real-world enterprise scenarios. It includes data from various SAP modules and processes, enabling users to perform in-depth analysis, build predictive models, and explore business insights.
Content: - Tables and Schemas: The dataset consists of multiple tables representing different aspects of SAP business operations, including but not limited to sales, inventory, finance, and procurement data. - Data Types: It contains structured data with fields such as transaction IDs, timestamps, customer details, product information, sales figures, and financial metrics. - Data Volume: The dataset is designed to simulate large-scale enterprise data, making it suitable for performance testing, data processing, and analysis.
Usage: - Business Analytics: Users can analyze business trends, sales performance, and financial metrics. - Machine Learning: Ideal for developing and testing machine learning models related to business forecasting, anomaly detection, and customer segmentation. - Data Processing: Suitable for practicing SQL queries, data transformation, and integration tasks.
Example Use Cases: - Sales Analysis: Track and analyze sales performance across different regions and time periods. - Inventory Management: Monitor inventory levels and identify trends in stock movements. - Financial Reporting: Generate financial reports and analyze expense patterns.
For more information and to access the dataset, visit the BigQuery public datasets page or refer to the dataset documentation in the BigQuery console.
Here's a Markdown table with the information you provided:
| File Name | Description |
|---|---|
| adr6.csv | Addresses with organizational units. Contains address details related to organizational units like departments or branches. |
| adrc.csv | General Address Data. Provides information about addresses, including details such as street, city, and postal codes. |
| adrct.csv | Address Contact Information. Contains contact information linked to addresses, including phone numbers and email addresses. |
| adrt.csv | Address Details. Includes detailed address data such as street addresses, city, and country codes. |
| ankt.csv | Accounting Document Segment. Provides details on segments within accounting documents, including account numbers and amounts. |
| anla.csv | Asset Master Data. Contains information about fixed assets, including asset identification and classification. |
| bkpf.csv | Accounting Document Header. Contains headers of accounting documents, such as document numbers and fiscal year. |
| bseg.csv | Accounting Document Segment. Details line items within accounting documents, including account details and amounts. |
| but000.csv | Business Partners. Contains basic information about business partners, including IDs and names. |
| but020.csv | Business Partner Addresses. Provides address details associated with business partners. |
| cepc.csv | Customer Master Data - Central. Contains centralized data for customer master records. |
| cepct.csv | Customer Master Data - Contact. Provides contact details associated with customer records. |
| csks.csv | Cost Center Master Data. Contains data about cost centers within the organization. |
| cskt.csv | Cost Center Texts. Provides text descriptions and labels for cost centers. |
| dd03l.csv | Data Element Field Labels. Contains labels and descriptions for data fields in the SAP system. |
| ekbe.csv | Purchase Order History. Details history of purchase orders, including quantities and values. |
| ekes.csv | Purchasing Document History. Contains history of purchasing documents including changes and statuses. |
| eket.csv | Purchase Order Item History. Details changes and statuses for individual purchase order items. |
| ekkn.csv | Purchase Order Account Assignment. Provides account assignment details for purchas... |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
BigQuery provides a limited number of sample tables that you can run queries against. These tables are suited for testing queries and learning BigQuery.
gsod: Contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010.
github_nested: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a nested schema. Created in September 2012.
github_timeline: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a flat schema. Created in May 2012.
natality: Describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008.
shakespeare: Contains a word index of the works of Shakespeare, giving the number of times each word appears in each corpus.
trigrams: Contains English language trigrams from a sample of works published between 1520 and 2008.
wikipedia: Contains the complete revision history for all Wikipedia articles up to April 2010.
Fork this kernel to get started.
Data Source: https://cloud.google.com/bigquery/sample-tables
Banner Photo by Mervyn Chan from Unplash.
How many babies were born in New York City on Christmas Day?
How many words are in the play Hamlet?
Facebook
Twitterhttps://choosealicense.com/licenses/osl-3.0/https://choosealicense.com/licenses/osl-3.0/
Process to Generate DuckDB Dataset
1. Load Repository Metadata
Read repo_metadata.json from GitHub Public Repository Metadata Normalize JSON into three lists: Repositories ā general metadata (stars, forks, license, etc.). Languages ā repo-language mappings with size. Topics ā repo-topic mappings.
Convert lists into Pandas DataFrames: df_repos, df_languages, df_topics.
2. Enhance with BigQuery Data
Create a temporary BigQuery table (repo_list)⦠See the full description on the dataset page: https://huggingface.co/datasets/deepgit/github_meta.
Facebook
TwitterThis dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
TwitterEcommerce data is typically proprietary and not shared by private companies. However, this dataset is sourced from Google Cloud's BigQuery public data. It comes from the "thelook_ecommerce" dataset, which consists of seven tables.
This dataset contains transactional data spanning from 2019 to 2024, capturing all global consumer transactions. The company primarily sells a wide range of products, including clothing and accessories, catering to all age groups. The majority of its customers are based in the USA, China, and Brazil.
An additional data table was created from the Events table to track user sessions where a purchase was completed within the same session. This table includes details such as the date and time of the user's first interaction with the webpage, recorded as sequence number 1, as well as the date and time of the final purchase event, along with the corresponding sequence number for that session id.
Facebook
TwitterStackExchange Dataset
Working doc: https://docs.google.com/document/d/1h585bH5sYcQW4pkHzqWyQqA4ape2Bq6o1Cya0TkMOQc/edit?usp=sharing
BigQuery query (see so_bigquery.ipynb): CREATE TEMP TABLE answers AS SELECT * FROM bigquery-public-data.stackoverflow.posts_answers WHERE LOWER(Body) LIKE '%arxiv%';
CREATE TEMPORARY TABLE questions AS SELECT * FROM bigquery-public-data.stackoverflow.posts_questions;
SELECT * FROM answers JOIN questions ON questions.id = answers.parent_id;
NOTE:⦠See the full description on the dataset page: https://huggingface.co/datasets/ag2435/stackexchange.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Labeled datasets are useful in machine learning research.
This public dataset contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.
Tables: 1) annotations_bbox 2) dict 3) images 4) labels
Update Frequency: Quarterly
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:open_images
https://cloud.google.com/bigquery/public-data/openimages
APA-style citation: Google Research (2016). The Open Images dataset [Image urls and labels]. Available from github: https://github.com/openimages/dataset.
Use: The annotations are licensed by Google Inc. under CC BY 4.0 license.
The images referenced in the dataset are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.
Banner Photo by Mattias Diesel from Unsplash.
Which labels are in the dataset? Which labels have "bus" in their display names? How many images of a trolleybus are in the dataset? What are some landing pages of images with a trolleybus? Which images with cherries are in the training set?
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Adapted from Wikipedia: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Created in 2004, it was inspired by the success of Wikipedia and more than two million registered users who can add data by manual survey, GPS devices, aerial photography, and other free sources. We've made available a number of tables (explained in detail below): history_* tables: full history of OSM objects planet_* tables: snapshot of current OSM objects as of Nov 2019 The history_* and planet_* table groups are composed of node, way, relation, and changeset tables. These contain the primary OSM data types and an additional changeset corresponding to OSM edits for convenient access. These objects are encoded using the BigQuery GEOGRAPHY data type so that they can be operated upon with the built-in geography functions to perform geometry and feature selection, additional processing. Example analyses are given below. This dataset is part of a larger effort to make data available in BigQuery through the Google Cloud Public Datasets program . OSM itself is produced as a public good by volunteers, and there are no guarantees about data quality. Interested in learning more about how these data were brought into BigQuery and how you can use them? Check out the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Bitcoin and other cryptocurrencies have captured the imagination of technologists, financiers, and economists. Digital currencies are only one application of the underlying blockchain technology. Like its predecessor, Bitcoin, the Ethereum blockchain can be described as an immutable distributed ledger. However, creator Vitalik Buterin also extended the set of capabilities by including a virtual machine that can execute arbitrary code stored on the blockchain as smart contracts.
Both Bitcoin and Ethereum are essentially OLTP databases, and provide little in the way of OLAP (analytics) functionality. However the Ethereum dataset is notably distinct from the Bitcoin dataset:
The Ethereum blockchain has as its primary unit of value Ether, while the Bitcoin blockchain has Bitcoin. However, the majority of value transfer on the Ethereum blockchain is composed of so-called tokens. Tokens are created and managed by smart contracts.
Ether value transfers are precise and direct, resembling accounting ledger debits and credits. This is in contrast to the Bitcoin value transfer mechanism, for which it can be difficult to determine the balance of a given wallet address.
Addresses can be not only wallets that hold balances, but can also contain smart contract bytecode that allows the programmatic creation of agreements and automatic triggering of their execution. An aggregate of coordinated smart contracts could be used to build a decentralized autonomous organization.
The Ethereum blockchain data are now available for exploration with BigQuery. All historical data are in the ethereum_blockchain dataset, which updates daily.
Our hope is that by making the data on public blockchain systems more readily available it promotes technological innovation and increases societal benefits.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_ethereum.[TABLENAME]. Fork this kernel to get started.
Cover photo by Thought Catalog on Unsplash
Facebook
TwitterThe Metropolitan Museum of Art, better known as the Met, provides a public domain dataset with over 200,000 objects including metadata and images. In early 2017, the Met debuted their Open Access policy to make part of their collection freely available for unrestricted use under the Creative Commons Zero designation and their own terms and conditions. This dataset provides a new view to one of the worldās premier collections of fine art. The data includes both image in Google Cloud Storage, and associated structured data in two BigQuery two tables, objects and images (1:N). Locations to images on both The Metās website and in Google Cloud Storage are available in the BigQuery table. The meta data for this public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . The image data for this public dataset is hosted in Google Cloud Storage and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Adapted from Wikipedia: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Created in 2004, it was inspired by the success of Wikipedia and more than two million registered users who can add data by manual survey, GPS devices, aerial photography, and other free sources.
To aid researchers, data scientists, and analysts in the effort to combat COVID-19, Google is making a hosted repository of public datasets including OpenStreetMap data, free to access. To facilitate the Kaggle community to access the BigQuery dataset, it is onboarded to Kaggle platform which allows querying it without a linked GCP account. Please note that due to the large size of the dataset, Kaggle applies a quota of 5 TB of data scanned per user per 30-days.
This is the OpenStreetMap (OSM) planet-wide dataset loaded to BigQuery.
Tables:
- history_* tables: full history of OSM objects.
- planet_* tables: snapshot of current OSM objects as of Nov 2019.
The history_* and planet_* table groups are composed of node, way, relation, and changeset tables. These contain the primary OSM data types and an additional changeset corresponding to OSM edits for convenient access. These objects are encoded using the BigQuery GEOGRAPHY data type so that they can be operated upon with the built-in geography functions to perform geometry and feature selection, additional processing.
You can read more about OSM elements on the OSM Wiki. This dataset uses BigQuery GEOGRAPHY datatype which supports a set of functions that can be used to analyze geographical data, determine spatial relationships between geographical features, and construct or manipulate GEOGRAPHYs.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.
The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:
Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.
Fork this kernel to get started.
Banner Photo by Edho Pratama from Unsplash.
What is the total number of transactions generated per device browser in July 2017?
The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?
What was the average number of product pageviews for users who made a purchase in July 2017?
What was the average number of product pageviews for users who did not make a purchase in July 2017?
What was the average total transactions per user that made a purchase in July 2017?
What is the average amount of money spent per session in July 2017?
What is the sequence of pages viewed?
Facebook
Twitterthe data is modified by me in sql to remove the drug which not get terminated the orignal data is -console.cloud.google.com/bigquery?ws=!1m5!1m4!4m3!1sbigquery-public-data!2sfda_drug!3sdrug_enforcement Table info Table ID bigquery-public-data.fda_drug.drug_enforcement Created Jul 14, 2017, 10:55:58āÆPM UTC+5:30 Last modified Jun 25, 2023, 5:38:39āÆAM UTC+5:30 Table expiration NEVER Data location US Default collation Default rounding mode ROUNDING_MODE_UNSPECIFIED Case insensitive false Description Labels Primary key(s)
Facebook
TwitterIn the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Learn more
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.
This is the list of manipulations performed on the original dataset, published by Mƶbius.
All the cleaning process and rearrangements were performed in BigQuery, using SQL functions.
1) After I took a closer look at the source dataset, I realized that for my case study, I did not need some of the tables contained in the original archive. Therefore, I decided not to import
- dailyCalories_merged.csv,
- dailyIntensities_merged.csv,
- dailySteps_merged.csv.
as they proved redundant, their content could be found in the dailyActivity_merged.csv file.
In addition, the files
- minutesCaloriesWide_merged.csv,
- minutesIntensitiesWide_merged.csv,
- minuteStepsWide_merged.csv.
were not imported, as they presented the same data contained in other files in a wide format. Hence, only the files with long format containing the same data were imported in the BigQuery database.
2) To be able to compare and measure the correlation among different variables based on hourly records, I decided to create a new table based on LEFT JOIN function and columns Id and ActivityHour. I repeated the same JOIN on tables with minute records. Hence I obtained 2 new tables: - hourly_activity.csv, - minute_activity.csv.
3) To validate most of the columns containing DATE and DATETIME values that were imported as STRING data type, I used the PARSE_DATE() and PARSE_DATETIME() commands. While importing the - heartrate_seconds_merged.csv, - hourlyCalories_merged.csv, - hourlyIntensities_merged.csv, - hourlySteps_merged.csv, - minutesCaloriesNarrow_merged.csv, - minuteIntensitiesNarrow_merged.csv, - minuteMETsNarrow_merged.csv, - minuteSleep_merged.csv, - minuteSteps_merged.csv, - sleepDay_merge.csv, - weigthLog_Info_merged.csv files to BigQuery, it was necessary to import the DATETIME and DATE type columns as STRING, because the original syntax, used in the CSV files, couldnāt be recognized as a correct DATETIME data type, due to āAMā and āPMā text at the end of the expression.
Facebook
TwitterIn the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. For more information please see this site.
To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience.
DISCLAIMER: The Financial Statement and Notes Data Sets contain information derived from structured data filed with the Commission by individual registrants as well as Commission-generated filing identifiers. Because the data sets are derived from information provided by individual registrants, we cannot guarantee the accuracy of the data sets. In addition, it is possible inaccuracies or other errors were introduced into the data sets during the process of extracting the data and compiling the data sets. Finally, the data sets do not reflect all available information, including certain metadata associated with Commission filings. The data sets are intended to assist the public in analyzing data contained in Commission filings; however, they are not a substitute for such filings. Investors should review the full Commission filings before making any investment decision.
Facebook
TwitterIn the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.äŗč§£čƦę
Facebook
TwitterThe AlphaFold Protein Structure Database is a collection of protein structure predictions made using the machine learning model AlphaFold. AlphaFold was developed by DeepMind , and this database was created in partnership with EMBL-EBI . For information on how to interpret, download and query the data, as well as on which proteins are included / excluded, and change log, please see our main dataset guide and FAQs . To interactively view individual entries or to download proteomes / Swiss-Prot please visit https://alphafold.ebi.ac.uk/ . The current release aims to cover most of the over 200M sequences in UniProt (a commonly used reference set of annotated proteins). The files provided for each entry include the structure plus two model confidence metrics (pLDDT and PAE). The files can be found in the Google Cloud Storage bucket gs://public-datasets-deepmind-alphafold-v4 with metadata in the BigQuery table bigquery-public-data.deepmind_alphafold.metadata . If you use this data, please cite: Jumper, J et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021) Varadi, M et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research (2021) This public dataset is hosted in Google Cloud Storage and is available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm
RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/
RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.
This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.
The following tables are included in the RxNorm dataset:
RXNCONSO contains concept and source information
RXNREL contains information regarding relationships between entities
RXNSAT contains attribute information
RXNSTY contains semantic information
RXNSAB contains source info
RXNCUI contains retired rxcui codes
RXNATOMARCHIVE contains archived data
RXNCUICHANGES contains concept changes
Update Frequency: Monthly
Fork this kernel to get started with this dataset.
https://www.nlm.nih.gov/research/umls/rxnorm/
https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm
https://cloud.google.com/bigquery/public-data/rxnorm
Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.
Banner Photo by @freestocks from Unsplash.
What are the RXCUI codes for the ingredients of a list of drugs?
Which ingredients have the most variety of dose forms?
In what dose forms is the drug phenylephrine found?
What are the ingredients of the drug labeled with the generic code number 072718?
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
-- Queries in SQL for the ETL process.
-- creating the first target table to capture the entire year.
SELECT
TRI.usertype,
ZIPSTART.zip_code AS zip_code_start,
ZIPSTARTNAME.borough borough_start,
ZIPSTARTNAME.neighborhood AS neighborhood_start,
ZIPEND.zip_code AS zip_code_end,
ZIPENDNAME.borough borough_end,
ZIPENDNAME.neighborhood AS neighborhood_end,
-- Since this is a fictional dashboard, we will add 6 years to make it look recent
DATE_ADD(DATE(TRI.starttime), INTERVAL 6 YEAR) AS start_day,
DATE_ADD(DATE(TRI.stoptime), INTERVAL 6 YEAR) AS stop_day,
WEA.temp AS day_mean_temperature, -- Mean temperature
WEA.wdsp AS day_mean_wind_speed, -- Mean wind speed
WEA.prcp day_total_precipitation, -- Total precipitation
-- Group trips into 10 minute intervals to reduces the number of rows
ROUND(CAST(TRI.tripduration / 60 AS INT64), -1) AS trip_minutes,
COUNT(TRI.bikeid) AS trip_count
FROM
bigquery-public-data.new_york_citibike.citibike_trips AS TRI
INNER JOIN
bigquery-public-data.geo_us_boundaries.zip_codes ZIPSTART
ON ST_WITHIN(
ST_GEOGPOINT(TRI.start_station_longitude, TRI.start_station_latitude),
ZIPSTART.zip_code_geom)
INNER JOIN
bigquery-public-data.geo_us_boundaries.zip_codes ZIPEND
ON ST_WITHIN(
ST_GEOGPOINT(TRI.end_station_longitude, TRI.end_station_latitude),
ZIPEND.zip_code_geom)
INNER JOIN
bigquery-public-data.noaa_gsod.gsod20* AS WEA
ON PARSE_DATE("%Y%m%d", CONCAT(WEA.year, WEA.mo, WEA.da)) = DATE(TRI.starttime)
INNER JOIN
my-project-for-da-cert-1.cyclistic.nyc_zips AS ZIPSTARTNAME
ON ZIPSTART.zip_code = CAST(ZIPSTARTNAME.zip AS STRING)
INNER JOIN
my-project-for-da-cert-1.cyclistic.nyc_zips AS ZIPENDNAME
ON ZIPEND.zip_code = CAST(ZIPENDNAME.zip AS STRING)
WHERE
-- This takes the weather data from new york central park, weather station id 94728
WEA.wban = '94728'
-- Use data from 2014 and 2015
AND EXTRACT(YEAR FROM DATE(TRI.starttime)) BETWEEN 2014 AND 2015
GROUP BY
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13;
-- creating the second target table to capture summer seasons. -- we will define summer as June to August.
SELECT
TRI.usertype,
TRI.start_station_longitude,
TRI.start_station_latitude,
TRI.end_station_longitude,
TRI.end_station_latitude,
ZIPSTART.zip_code AS zip_code_start,
ZIPSTARTNAME.borough borough_start,
ZIPSTARTNAME.neighborhood AS neighborhood_start,
ZIPEND.zip_code AS zip_code_end,
ZIPENDNAME.borough borough_end,
ZIPENDNAME.neighborhood AS neighborhood_end,
-- Since we're using trips from 2014 and 2015, we will add 6 years to make it look recent
DATE_ADD(DATE(TRI.starttime), INTERVAL 6 YEAR) AS start_day,
DATE_ADD(DATE(TRI.stoptime), INTERVAL 6 YEAR) AS stop_day,
WEA.temp AS day_mean_temperature, -- Mean temperature
WEA.wdsp AS day_mean_wind_speed, -- Mean wind speed
WEA.prcp day_total_precipitation, -- Total precipitation
-- We will group trips into 10 minute intervals, which also reduces the number of rows
ROUND(CAST(TRI.tripduration / 60 AS INT64), -1) AS trip_minutes,
TRI.bikeid
FROM
bigquery-public-data.new_york_citibike.citibike_trips AS TRI
INNER JOIN
bigquery-public-data.geo_us_boundaries.zip_codes ZIPSTART
ON ST_WITHIN(
ST_GEOGPOINT(TRI.start_station_longitude, TRI.start_station_latitude),
ZIPSTART.zip_code_geom)
INNER JOIN
bigquery-public-data.geo_us_boundaries.zip_codes ZIPEND
ON ST_WITHIN(
ST_GEOGPOINT(TRI.end_station_longitude, TRI.end_station_latitude),
ZIPEND.zip_code_geom)
INNER JOIN
bigquery-public-data.noaa_gsod.gsod20* AS WEA
ON PARSE_DATE("%Y%m%d", CONCAT(WEA.year, WEA.mo, WEA.da)) = DATE(TRI.starttime)
INNER JOIN
my-project-for-da-cert-1.cyclistic.nyc_zips AS ZIPSTARTNAME
ON ZIPSTART.zip_code = CAST(ZIPSTARTNAME.zip AS STRING)
INNER JOIN
my-project-for-da-cert-1.cyclistic.nyc_zips AS ZIPENDNAME
ON ZIPEND.zip_code = CAST(ZIPENDNAME.zip AS STRING)
WHERE
-- Take the weather from the same new york central park weather station, id 94728
WEA.wban = '94728'
-- Use data for the three summer months
AND DATE(TRI.starttime) BETWEEN DATE('2015-06-01') AND DATE('2015-08-31');
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Certainly! Here's a description for the Kaggle dataset related to the cloud-training-demos.SAP_REPLICATED_DATA BigQuery public dataset:
Dataset ID: cloud-training-demos.SAP_REPLICATED_DATA
Overview:
The SAP_REPLICATED_DATA dataset in BigQuery provides a comprehensive replication of SAP (Systems, Applications, and Products in Data Processing) business data. This dataset is designed to support data analytics and machine learning tasks by offering a rich set of structured data that mimics real-world enterprise scenarios. It includes data from various SAP modules and processes, enabling users to perform in-depth analysis, build predictive models, and explore business insights.
Content: - Tables and Schemas: The dataset consists of multiple tables representing different aspects of SAP business operations, including but not limited to sales, inventory, finance, and procurement data. - Data Types: It contains structured data with fields such as transaction IDs, timestamps, customer details, product information, sales figures, and financial metrics. - Data Volume: The dataset is designed to simulate large-scale enterprise data, making it suitable for performance testing, data processing, and analysis.
Usage: - Business Analytics: Users can analyze business trends, sales performance, and financial metrics. - Machine Learning: Ideal for developing and testing machine learning models related to business forecasting, anomaly detection, and customer segmentation. - Data Processing: Suitable for practicing SQL queries, data transformation, and integration tasks.
Example Use Cases: - Sales Analysis: Track and analyze sales performance across different regions and time periods. - Inventory Management: Monitor inventory levels and identify trends in stock movements. - Financial Reporting: Generate financial reports and analyze expense patterns.
For more information and to access the dataset, visit the BigQuery public datasets page or refer to the dataset documentation in the BigQuery console.
Here's a Markdown table with the information you provided:
| File Name | Description |
|---|---|
| adr6.csv | Addresses with organizational units. Contains address details related to organizational units like departments or branches. |
| adrc.csv | General Address Data. Provides information about addresses, including details such as street, city, and postal codes. |
| adrct.csv | Address Contact Information. Contains contact information linked to addresses, including phone numbers and email addresses. |
| adrt.csv | Address Details. Includes detailed address data such as street addresses, city, and country codes. |
| ankt.csv | Accounting Document Segment. Provides details on segments within accounting documents, including account numbers and amounts. |
| anla.csv | Asset Master Data. Contains information about fixed assets, including asset identification and classification. |
| bkpf.csv | Accounting Document Header. Contains headers of accounting documents, such as document numbers and fiscal year. |
| bseg.csv | Accounting Document Segment. Details line items within accounting documents, including account details and amounts. |
| but000.csv | Business Partners. Contains basic information about business partners, including IDs and names. |
| but020.csv | Business Partner Addresses. Provides address details associated with business partners. |
| cepc.csv | Customer Master Data - Central. Contains centralized data for customer master records. |
| cepct.csv | Customer Master Data - Contact. Provides contact details associated with customer records. |
| csks.csv | Cost Center Master Data. Contains data about cost centers within the organization. |
| cskt.csv | Cost Center Texts. Provides text descriptions and labels for cost centers. |
| dd03l.csv | Data Element Field Labels. Contains labels and descriptions for data fields in the SAP system. |
| ekbe.csv | Purchase Order History. Details history of purchase orders, including quantities and values. |
| ekes.csv | Purchasing Document History. Contains history of purchasing documents including changes and statuses. |
| eket.csv | Purchase Order Item History. Details changes and statuses for individual purchase order items. |
| ekkn.csv | Purchase Order Account Assignment. Provides account assignment details for purchas... |