100+ datasets found
  1. Google (Alphabet) Stock Market Dataset (2004–2026)

    • kaggle.com
    zip
    Updated Mar 19, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anadi Gupta (2026). Google (Alphabet) Stock Market Dataset (2004–2026) [Dataset]. https://www.kaggle.com/datasets/anadiskt/google-alphabet-stock-market-dataset-20042026
    Explore at:
    zip(431757 bytes)Available download formats
    Dataset updated
    Mar 19, 2026
    Authors
    Anadi Gupta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📊 Google (Alphabet) Stock Dataset (2004–2026)

    🧾 Context

    This dataset provides a comprehensive collection of historical stock market data for Google (Alphabet Inc.), one of the most influential technology companies in the world.

    Since its IPO in 2004, Google has demonstrated significant growth, innovation, and market dominance. Its stock performance reflects broader trends in the technology sector and global economy, making it highly valuable for investors, analysts, and data scientists.

    This dataset enables users to explore long-term trends, perform financial analysis, and build predictive models using real-world stock market data.

    📦 Content

    The dataset contains clean, structured, and analysis-ready daily stock data for Google (GOOGL).

    📁 Files Included:

    • google_stock_price_daily.csv → Raw daily stock data
    • google_stock_master_dataset.csv → Cleaned and combined dataset

    📊 Features:

    • Daily stock prices (Open, High, Low, Close)
    • Adjusted Close prices
    • Trading Volume
    • Time-series formatted data
    • Preprocessed for analysis and modeling

    📊 Column Description

    Column NameDescription
    DateTrading date
    OpenOpening price of the stock
    HighHighest price during the day
    LowLowest price during the day
    CloseClosing price of the stock
    Adj CloseAdjusted closing price (accounts for splits and dividends)
    VolumeNumber of shares traded

    🔍 Potential Use Cases

    This dataset is ideal for:

    • 📈 Exploratory Data Analysis (EDA)
    • 🔮 Time Series Forecasting
    • 🤖 Machine Learning Models
    • 📊 Data Visualization
    • 📉 Volatility & Risk Analysis
    • 💼 Financial Market Research

    🧹 Data Collection & Processing

    • Data sourced from publicly available financial platforms (e.g., Yahoo Finance, Nasdaq)
    • Cleaned and structured for consistency
    • Handled missing values and formatting issues
    • Standardized column names for usability
    • Prepared for direct use in analysis and ML workflows

    🌟 Why This Dataset?

    • ✅ Covers long-term historical data (2004–2026)
    • ✅ Clean and well-structured format
    • ✅ Beginner-friendly and expert-ready
    • ✅ Suitable for EDA, ML, and financial modeling
    • ✅ High usability for Kaggle projects

    📌 Notes

    • All prices are in USD
    • Dataset follows a daily frequency
    • Adjusted Close accounts for corporate actions

    🙏 Acknowledgement

    This dataset is compiled using publicly available financial data sources.
    It is intended for educational and research purposes only.

    ⚠️ Disclaimer

    This dataset does not provide financial advice.
    Users should conduct their own research before making any investment decisions.

    🏷️ Tags

    finance stock market time series stocks historical data
    financial data machine learning data analysis forecasting
    prediction google alphabet GOOGL

    🚀 If You Find This Useful

    ⭐ Please consider upvoting the dataset
    💬 Feedback and suggestions are always welcome
    🔁 Feel free to fork and build your own analysis

  2. Google Sales by Segment

    • kaggle.com
    zip
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Devaang Barthwal (2025). Google Sales by Segment [Dataset]. https://www.kaggle.com/datasets/devaangbarthwal/google-sales-by-segment
    Explore at:
    zip(686 bytes)Available download formats
    Dataset updated
    Nov 28, 2025
    Authors
    Devaang Barthwal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📊 Alphabet Inc. (Google) Segment Revenue History (2013-2024)

    This dataset provides a comprehensive, standardized, and time-series view of Alphabet Inc.'s (Google's) segment revenues, compiled from various quarterly and annual disclosures (primarily Forms 10-K and Earnings Releases) spanning from 2013 through 2024.

    The data has been meticulously cleaned to account for significant shifts in corporate financial reporting structure over the decade, making it immediately useful for longitudinal analysis. All figures are presented in millions of U.S. Dollars.

    Dataset Structure & Standardization

    The original data categories have been unified into a consistent set of line items to facilitate analysis across all years:

    Standardized CategoryCorresponding Historical NamesTime Range (Availability)
    Google propertiesGoogle websites2013–2016 (Discontinued)
    Google Search & other-2017–2024 (Successor to part of "Google properties")
    YouTube adsYouTube ads (1)2017–2024 (Successor to part of "Google properties")
    Google NetworkGoogle Network Members' websites, Google Network Members' properties2013–2024
    Google subscriptions, platforms, and devicesGoogle other revenues, Google other2013–2024
    Google Cloud-2017–2024 (Carved out of "Google other" category)
    Other BetsOther Bets revenues2013–2024
    Hedging gainsHedging gains (losses)2020–2024
    Total Revenues(Calculated)2013–2024

    Key Notes on Segment Definitions (Per Google 10-K Disclosures)

    This section provides crucial context for interpreting the historical revenue lines, as the definitions of these categories have evolved:

    • Advertising Revenue: Revenue from Google Network (formerly "Google Network Members' properties") is primarily generated through AdMob, AdSense, and Google Ad Manager. This revenue is generally reported on a gross basis, meaning the amounts billed to customers are recorded as revenue, and amounts paid to partners (Traffic Acquisition Costs, or TAC) are recorded separately in the cost of revenues.
    • Segment Reclassification (2017 Onward): Prior to 2017, advertising revenue was largely grouped under "Google properties" and "Google Network." Beginning in 2017, "Google properties" was split to provide greater detail for "Google Search & other" and "YouTube ads" (reflecting the increasing prominence of YouTube's advertising business).
    • Google subscriptions, platforms, and devices: This segment encompasses Alphabet's non-advertising revenues, which includes:
      • Sales of products like Pixel phones and other devices.
      • Revenues from the Google Play Store (app sales, in-app purchases), where revenue is recorded on a net basis as Google acts as an agent facilitating transactions between developers and users.
      • YouTube non-advertising revenues (e.g., YouTube Premium subscriptions and YouTube TV).
    • Google Cloud: This segment was explicitly broken out as a separate reporting unit beginning in 2017 to highlight its increasing scale and strategic importance. Revenue prior to 2017 was historically included within the older "Google other revenues" category.
    • Other Bets: This segment represents revenue from Alphabet's non-Google ventures (e.g., Waymo, Verily, Fiber).

    Potential Analysis

    This dataset is ideal for:

    • Time-Series Analysis: Tracking the growth rate of core advertising segments versus emerging businesses (Cloud, Subscriptions).
    • Strategic Shift Modeling: Analyzing the inflection point (c. 2017) when Google formally split its revenue segments, allowing for pre- and post-split comparisons.
    • Segment Weighting: Calculating the proportional contribution of high-growth segments like Google Cloud and YouTube Ads to the Total Revenues over time.
    • Trend Forecasting: Projecting future growth trajectories for each segment.
  3. d

    Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment &...

    • datarade.ai
    .json, .csv
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataplex (2025). Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights [Dataset]. https://datarade.ai/data-products/dataplex-google-reviews-ratings-dataset-track-consumer-s-dataplex
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Dataplex
    Area covered
    South Georgia and the South Sandwich Islands, Palau, Ethiopia, Bhutan, Sweden, Grenada, Guinea, Korea (Democratic People's Republic of), British Indian Ocean Territory, French Polynesia
    Description

    The Google Reviews & Ratings Dataset provides businesses with structured insights into customer sentiment, satisfaction, and trends based on reviews from Google. Unlike broad review datasets, this product is location-specific—businesses provide the locations they want to track, and we retrieve as much historical data as possible, with daily updates moving forward.

    This dataset enables businesses to monitor brand reputation, analyze consumer feedback, and enhance decision-making with real-world insights. For deeper analysis, optional AI-driven sentiment analysis and review summaries are available on a weekly, monthly, or yearly basis.

    Dataset Highlights

    • Location-Specific Reviews – Reviews and ratings for the locations you provide.
    • Daily Updates – New reviews and rating changes updated automatically.
    • Historical Data Access – Retrieve past reviews where available.
    • AI Sentiment Analysis (Optional) – Summarized insights by week, month, or year.
    • Competitive Benchmarking – Compare performance across selected locations.

    Use Cases

    • Franchise & Retail Chains – Monitor brand reputation and performance across locations.
    • Hospitality & Restaurants – Track guest sentiment and service trends.
    • Healthcare & Medical Facilities – Understand patient feedback for specific locations.
    • Real Estate & Property Management – Analyze tenant and customer experiences through reviews.
    • Market Research & Consumer Insights – Identify trends and analyze feedback patterns across industries.

    Data Updates & Delivery

    • Update Frequency: Daily
    • Data Format: CSV for easy integration
    • Delivery: Secure file transfer (SFTP or cloud storage)

    Data Fields Include:

    • Business Name
    • Location Details
    • Star Ratings
    • Review Text
    • Timestamps
    • Reviewer Metadata

    Optional Add-Ons:

    • AI Sentiment Analysis – Aggregate trends by week, month, or year.
    • Custom Location Tracking – Tailor the dataset to fit your specific business needs.

    Ideal for

    • Marketing Teams – Leverage real-world consumer feedback to optimize brand strategy.
    • Business Analysts – Use structured review data to track customer sentiment over time.
    • Operations & Customer Experience Teams – Identify service issues and opportunities for improvement.
    • Competitive Intelligence – Compare locations and benchmark against industry competitors.

    Why Choose This Dataset?

    • Accurate & Up-to-Date – Daily updates ensure fresh, reliable data.
    • Scalable & Customizable – Track only the locations that matter to you.
    • Actionable Insights – AI-driven summaries for quick decision-making.
    • Easy Integration – Delivered in a structured format for seamless analysis.

    By leveraging Google Reviews & Ratings Data, businesses can gain valuable insights into customer sentiment, enhance reputation management, and stay ahead of the competition.

  4. GA data with json columns

    • kaggle.com
    zip
    Updated Oct 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colin Pearse (2018). GA data with json columns [Dataset]. https://www.kaggle.com/datasets/colinpearse/ga-analytics-with-json-columns
    Explore at:
    zip(75129330 bytes)Available download formats
    Dataset updated
    Oct 29, 2018
    Authors
    Colin Pearse
    Description

    Context

    Making dataset "Google Analytics Customer Revenue Prediction" easier and quicker to parse.

    Content

    This is the same information as dataset "Google Analytics Customer Revenue Prediction" with the JSON columns expanded (flattened) into additional csv columns.

    Acknowledgements

    Thanks to the original dataset "Google Analytics Customer Revenue Prediction"; it's safe to say that without you I could not exist as a more reduced space but equally as informative dataset.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  5. SEC Public Dataset

    • console.cloud.google.com
    Updated Aug 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=es-419 (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=es-419
    Explore at:
    Dataset updated
    Aug 16, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Más información

  6. L

    Google Maps Dominance Strategy Dataset for Colorado Springs Local Businesses...

    • caseysseo.com
    • myseosites.blob.core.windows.net
    Updated Jan 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey Miller (2025). Google Maps Dominance Strategy Dataset for Colorado Springs Local Businesses [Dataset]. https://caseysseo.com/this-will-make-you-the-dominant-force-in-colorado-springs-google-maps/
    Explore at:
    Dataset updated
    Jan 11, 2025
    Dataset provided by
    Casey's SEO
    Authors
    Casey Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    Colorado, Colorado Springs
    Variables measured
    Content Word Count, Tourist Visitor Volume, Colorado Springs Population, NAP Inconsistency Frequency, Consumer Google Maps Usage Rate, Local Search Market Penetration, Local Search Visit Conversion Rate, Customer Loss Rate for Non-Top Rankings
    Measurement technique
    Local search ranking analysis and tracking, Competitive positioning assessment, Citation consistency auditing across web directories, Review generation and response rate measurement, Customer behavior pattern analysis, Google Business Profile performance monitoring, Local content engagement metrics analysis
    Description

    Comprehensive dataset analyzing local search optimization strategies, Google Maps ranking factors, and proven methodologies for Colorado Springs businesses to achieve dominant positioning in local search results. This dataset includes performance metrics, optimization techniques, customer behavior analysis, and systematic approaches for local SEO success.

  7. o

    How to make google plus posts private - Dataset - openAFRICA

    • open.africa
    Updated Jan 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). How to make google plus posts private - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/how-to-make-google-plus-posts-private
    Explore at:
    Dataset updated
    Jan 4, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number

  8. Google Trends - International

    • console.cloud.google.com
    Updated Jul 22, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program (2018). Google Trends - International [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-trends-intl
    Explore at:
    Dataset updated
    Jul 22, 2018
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Google Searchhttp://google.com/
    Googlehttp://google.com/
    Description

    The International Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data for each country and region across the globe, where data is available. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  9. Z

    Google's Audioset: Reformatted

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bakhtin (2022). Google's Audioset: Reformatted [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7096701
    Explore at:
    Dataset updated
    Sep 21, 2022
    Dataset provided by
    Alexander
    Authors
    Bakhtin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Google's AudioSet consistently reformatted

    During my work with Google's AudioSet(https://research.google.com/audioset/index.html) I encountered some problems due to the fact that Weak (https://research.google.com/audioset/download.html) and Strong (https://research.google.com/audioset/download_strong.html) versions of the dataset used different csv formatting for the data, and that also labels used in the two datasets are different (https://github.com/audioset/ontology/issues/9) and also presented in files with different formatting.

    This dataset reformatting aims to unify the formats of the datasets so that it is possible to analyse them in the same pipelines, and also make the dataset files compatible with psds_eval, dcase_util and sed_eval Python packages used in Audio Processing.

    For better formatted documentation and source code of reformatting refer to https://github.com/bakhtos/GoogleAudioSetReformatted

    -Changes in dataset

    All files are converted to tab-separated *.tsv files (i.e. csv files with \t as a separator). All files have a header as the first line.

    -New fields and filenames

    Fields are renamed according to the following table, to be compatible with psds_eval:

    Old field -> New field YTID -> filename segment_id -> filename start_seconds -> onset start_time_seconds -> onset end_seconds -> offset end_time_seconds -> offset positive_labels -> event_label label -> event_label present -> present

    For class label files, id is now the name for the for mid label (e.g. /m/09xor) and label for the human-readable label (e.g. Speech). Index of label indicated for Weak dataset labels (index field in class_labels_indices.csv) is not used.

    Files are renamed according to the following table to ensure consisted naming of the form audioset_[weak|strong]_[train|eval]_[balanced|unbalanced|posneg]*.tsv:

    Old name -> New name balanced_train_segments.csv -> audioset_weak_train_balanced.tsv unbalanced_train_segments.csv -> audioset_weak_train_unbalanced.tsv eval_segments.csv -> audioset_weak_eval.tsv audioset_train_strong.tsv -> audioset_strong_train.tsv audioset_eval_strong.tsv -> audioset_strong_eval.tsv audioset_eval_strong_framed_posneg.tsv -> audioset_strong_eval_posneg.tsv class_labels_indices.csv -> class_labels.tsv (merged with mid_to_display_name.tsv) mid_to_display_name.tsv -> class_labels.tsv (merged with class_labels_indices.csv)

    -Strong dataset changes

    Only changes to the Strong dataset are renaming of fields and reordering of columns, so that both Weak and Strong version have filename and event_label as first two columns.

    -Weak dataset changes

    -- Labels are given one per line, instead of comma-separated and quoted list

    -- To make sure that filename format is the same as in Strong version, the following format change is made: The value of the start_seconds field is converted to milliseconds and appended to the filename with an underscore. Since all files in the dataset are assumed to be 10 seconds long, this unifies the format of filename with the Strong version and makes end_seconds also redundant.

    -Class labels changes

    Class labels from both datasets are merged into one file and given in alphabetical order of ids. Since same ids are present in both datasets, but sometimes with different human-readable labels, labels from Strong dataset overwrite those from Weak. It is possible to regenerate class_labels.tsv while giving priority to the Weak version of labels by calling convert_labels(False) from convert.py in the GitHub repository.

    -License

    Google's AudioSet was published in two stages - first the Weakly labelled data (Gemmeke, Jort F., et al. "Audio set: An ontology and human-labeled dataset for audio events." 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017.), then the strongly labelled data (Hershey, Shawn, et al. "The benefit of temporally-strong labels in audio event classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.)

    Both the original dataset and this reworked version are licensed under CC BY 4.0

    Class labels come from the AudioSet Ontology, which is licensed under CC BY-SA 4.0.

  10. SEC Public Dataset

    • console.cloud.google.com
    Updated May 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=ja (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=ja
    Explore at:
    Dataset updated
    May 12, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.詳細

  11. SEC Public Dataset

    • console.cloud.google.com
    Updated May 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=zh-CN (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=zh-CN
    Explore at:
    Dataset updated
    May 14, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.了解详情

  12. SEC Public Dataset

    • console.cloud.google.com
    Updated Aug 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=pt-BR (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=pt-BR
    Explore at:
    Dataset updated
    Aug 18, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Saiba mais

  13. SEC Public Dataset

    • console.cloud.google.com
    Updated Aug 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=zh-TW (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=zh-TW
    Explore at:
    Dataset updated
    Aug 18, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.瞭解詳情

  14. T

    civil_comments

    • tensorflow.org
    • huggingface.co
    Updated Feb 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). civil_comments [Dataset]. https://www.tensorflow.org/datasets/catalog/civil_comments
    Explore at:
    Dataset updated
    Feb 28, 2023
    Description

    This version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.

    The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.

    The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, publication IDs, timestamps and commenter-generated "civility" labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.

    For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the "parent_text" feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('civil_comments', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  15. Google Analytics Sample

    • console.cloud.google.com
    Updated Jul 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=de (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=de
    Explore at:
    Dataset updated
    Jul 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  16. About COVID-19 Public Datasets

    • console.cloud.google.com
    Updated Jun 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ES (2022). About COVID-19 Public Datasets [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-public-data-program?hl=ES
    Explore at:
    Dataset updated
    Jun 19, 2022
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    In an effort to help combat COVID-19, we created a COVID-19 Public Datasets program to make data more accessible to researchers, data scientists and analysts. The program will host a repository of public datasets that relate to the COVID-19 crisis and make them free to access and analyze. These include datasets from the New York Times, European Centre for Disease Prevention and Control, Google, Global Health Data from the World Bank, and OpenStreetMap. Free hosting and queries of COVID datasets As with all data in the Google Cloud Public Datasets Program , Google pays for storage of datasets in the program. BigQuery also provides free queries over certain COVID-related datasets to support the response to COVID-19. Queries on COVID datasets will not count against the BigQuery sandbox free tier , where you can query up to 1TB free each month. Limitations and duration Queries of COVID data are free. If, during your analysis, you join COVID datasets with non-COVID datasets, the bytes processed in the non-COVID datasets will be counted against the free tier, then charged accordingly, to prevent abuse. Queries of COVID datasets will remain free until Sept 15, 2021. The contents of these datasets are provided to the public strictly for educational and research purposes only. We are not onboarding or managing PHI or PII data as part of the COVID-19 Public Dataset Program. Google has practices & policies in place to ensure that data is handled in accordance with widely recognized patient privacy and data security policies. See the list of all datasets included in the program

  17. Characterizing the Google Books Corpus: Strong Limits to Inferences of...

    • plos.figshare.com
    pdf
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eitan Adam Pechenick; Christopher M. Danforth; Peter Sheridan Dodds (2023). Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution [Dataset]. http://doi.org/10.1371/journal.pone.0137041
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Eitan Adam Pechenick; Christopher M. Danforth; Peter Sheridan Dodds
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It is tempting to treat frequency trends from the Google Books data sets as indicators of the “true” popularity of various words and phrases. Doing so allows us to draw quantitatively strong conclusions about the evolution of cultural perception of a given topic, such as time or gender. However, the Google Books corpus suffers from a number of limitations which make it an obscure mask of cultural popularity. A primary issue is that the corpus is in effect a library, containing one of each book. A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not. With this understood, the Google Books corpus remains an important data set to be considered more lexicon-like than text-like. Here, we show that a distinct problematic feature arises from the inclusion of scientific texts, which have become an increasingly substantive portion of the corpus throughout the 1900s. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We use information theoretic methods to highlight these dynamics by examining and comparing major contributions via a divergence measure of English data sets between decades in the period 1800–2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts. Overall, our findings call into question the vast majority of existing claims drawn from the Google Books corpus, and point to the need to fully characterize the dynamics of the corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.

  18. Linked Open Data Management Services: A Comparison

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova (2023). Linked Open Data Management Services: A Comparison [Dataset]. http://doi.org/10.5281/zenodo.7738424
    Explore at:
    Dataset updated
    Sep 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:

    • ConedaKOR
    • LinkedDataHub
    • Metaphacts
    • Omeka S
    • ResearchSpace
    • Vitro
    • Wikibase
    • WissKI

    The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.

    The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].

    [1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.

    [2] Full paper will be made available open access in the conference proceedings.

  19. s

    Data from: Fostering cultures of open qualitative research: Dataset 1 –...

    • orda.shef.ac.uk
    docx
    Updated Oct 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Hanchard; Itzel San Roman Pineda (2025). Fostering cultures of open qualitative research: Dataset 1 – Survey Responses [Dataset]. http://doi.org/10.15131/shef.data.23567250.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    The University of Sheffield
    Authors
    Matthew Hanchard; Itzel San Roman Pineda
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute.

    The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

    · Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

    The project was funded with £13,913.85 Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

    The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021.This includes due concern for participant anonymity and data management.

    ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license.

    This dataset comprises one spreadsheet with N=91 anonymised survey responses .xslx format. It includes all responses to the project survey which used Google Forms between 06-Feb-2023 and 30-May-2023. The spreadsheet can be opened with Microsoft Excel, Google Sheet, or open-source equivalents.

    The survey responses include a random sample of researchers worldwide undertaking qualitative, mixed-methods, or multi-modal research.

    The recruitment of respondents was initially purposive, aiming to gather responses from qualitative researchers at research-intensive (targetted Russell Group) Universities. This involved speculative emails and a call for participant on the University of Sheffield ‘Qualitative Open Research Network’ mailing list. As result, the responses include a snowball sample of scholars from elsewhere.

    The spreadsheet has two tabs/sheets: one labelled ‘SurveyResponses’ contains the anonymised and tidied set of survey responses; the other, labelled ‘VariableMapping’, sets out each field/column in the ‘SurveyResponses’ tab/sheet against the original survey questions and responses it relates to.

    The survey responses tab/sheet includes a field/column labelled ‘RespondentID’ (using randomly generated 16-digit alphanumeric keys) which can be used to connect survey responses to interview participants in the accompanying ‘Fostering cultures of open qualitative research: Dataset 2 – Interview transcripts’ files.

    A set of survey questions gathering eligibility criteria detail and consent are not listed with in this dataset, as below. All responses provide in the dataset gained a ‘Yes’ response to all the below questions (with the exception of one question, marked with an asterisk (*) below):

    · I am aged 18 or over · I have read the information and consent statement and above. · I understand how to ask questions and/or raise a query or concern about the survey. · I agree to take part in the research and for my responses to be part of an open access dataset. These will be anonymised unless I specifically ask to be named. · I understand that my participation does not create a legally binding agreement or employment relationship with the University of Sheffield · I understand that I can withdraw from the research at any time. · I assign the copyright I hold in materials generated as part of this project to The University of Sheffield. · * I am happy to be contacted after the survey to take part in an interview.

    The project was undertaken by two staff: Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk

    Postdoctoral Research Assistant Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science

  20. g

    The Quick, Draw! Dataset

    • github.com
    • carrfratagen43.blogspot.com
    Updated Mar 1, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2017). The Quick, Draw! Dataset [Dataset]. https://github.com/googlecreativelab/quickdraw-dataset
    Explore at:
    Dataset updated
    Mar 1, 2017
    Dataset provided by
    Google
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game "Quick, Draw!". The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located.

    Example drawings: https://raw.githubusercontent.com/googlecreativelab/quickdraw-dataset/master/preview.jpg" alt="preview">

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anadi Gupta (2026). Google (Alphabet) Stock Market Dataset (2004–2026) [Dataset]. https://www.kaggle.com/datasets/anadiskt/google-alphabet-stock-market-dataset-20042026
Organization logo

Google (Alphabet) Stock Market Dataset (2004–2026)

Comprehensive Google stock dataset with daily prices, volume, and market trends

Explore at:
zip(431757 bytes)Available download formats
Dataset updated
Mar 19, 2026
Authors
Anadi Gupta
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

📊 Google (Alphabet) Stock Dataset (2004–2026)

🧾 Context

This dataset provides a comprehensive collection of historical stock market data for Google (Alphabet Inc.), one of the most influential technology companies in the world.

Since its IPO in 2004, Google has demonstrated significant growth, innovation, and market dominance. Its stock performance reflects broader trends in the technology sector and global economy, making it highly valuable for investors, analysts, and data scientists.

This dataset enables users to explore long-term trends, perform financial analysis, and build predictive models using real-world stock market data.

📦 Content

The dataset contains clean, structured, and analysis-ready daily stock data for Google (GOOGL).

📁 Files Included:

  • google_stock_price_daily.csv → Raw daily stock data
  • google_stock_master_dataset.csv → Cleaned and combined dataset

📊 Features:

  • Daily stock prices (Open, High, Low, Close)
  • Adjusted Close prices
  • Trading Volume
  • Time-series formatted data
  • Preprocessed for analysis and modeling

📊 Column Description

Column NameDescription
DateTrading date
OpenOpening price of the stock
HighHighest price during the day
LowLowest price during the day
CloseClosing price of the stock
Adj CloseAdjusted closing price (accounts for splits and dividends)
VolumeNumber of shares traded

🔍 Potential Use Cases

This dataset is ideal for:

  • 📈 Exploratory Data Analysis (EDA)
  • 🔮 Time Series Forecasting
  • 🤖 Machine Learning Models
  • 📊 Data Visualization
  • 📉 Volatility & Risk Analysis
  • 💼 Financial Market Research

🧹 Data Collection & Processing

  • Data sourced from publicly available financial platforms (e.g., Yahoo Finance, Nasdaq)
  • Cleaned and structured for consistency
  • Handled missing values and formatting issues
  • Standardized column names for usability
  • Prepared for direct use in analysis and ML workflows

🌟 Why This Dataset?

  • ✅ Covers long-term historical data (2004–2026)
  • ✅ Clean and well-structured format
  • ✅ Beginner-friendly and expert-ready
  • ✅ Suitable for EDA, ML, and financial modeling
  • ✅ High usability for Kaggle projects

📌 Notes

  • All prices are in USD
  • Dataset follows a daily frequency
  • Adjusted Close accounts for corporate actions

🙏 Acknowledgement

This dataset is compiled using publicly available financial data sources.
It is intended for educational and research purposes only.

⚠️ Disclaimer

This dataset does not provide financial advice.
Users should conduct their own research before making any investment decisions.

🏷️ Tags

finance stock market time series stocks historical data
financial data machine learning data analysis forecasting
prediction google alphabet GOOGL

🚀 If You Find This Useful

⭐ Please consider upvoting the dataset
💬 Feedback and suggestions are always welcome
🔁 Feel free to fork and build your own analysis

Search
Clear search
Close search
Google apps
Main menu