100+ datasets found
  1. Google Ads Transparency Center

    • console.cloud.google.com
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
    Explore at:
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  2. GCP-Cloud-Billing-Data

    • kaggle.com
    zip
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SAIRAM N (2024). GCP-Cloud-Billing-Data [Dataset]. https://www.kaggle.com/datasets/sairamn19/gcp-cloud-billing-data
    Explore at:
    zip(59956 bytes)Available download formats
    Dataset updated
    Aug 30, 2024
    Authors
    SAIRAM N
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    The importance of effectively using Google Cloud Platform (GCP) billing data to gain actionable insights into cloud spending. It emphasizes the need for strategic cost management, offering guidance on how to analyze billing data, optimize resource usage, and implement best practices to minimize costs while maximizing the value derived from cloud services. The subtitle is geared towards businesses and technical teams looking to maintain financial control and improve their cloud operations.

    This dataset contains the data of GCP billing cloud cost. For a updated one, comment ! contact !

  3. MultiversX Blockchain

    • console.cloud.google.com
    Updated Jan 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data (2024). MultiversX Blockchain [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/blockchain-analytics-multiversx-mainnet-eu
    Explore at:
    Dataset updated
    Jan 10, 2024
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    MultiversX is a highly scalable, secure and decentralized blockchain network created to enable radically new applications, for users, businesses, society, and the new metaverse frontier. This dataset is one of many crypto datasets that are available within Google Cloud Public Datasets . As with other Google Cloud public datasets, you can query this dataset for free, up to 1TB/month of free processing, every month. Watch this short video to learn how to get started with the public datasets.

  4. covid19-public-forecasts

    • kaggle.com
    zip
    Updated Aug 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). covid19-public-forecasts [Dataset]. https://www.kaggle.com/bigquery/covid19-public-forecasts
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Aug 13, 2020
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    Description

    Context

    In partnership with the Harvard Global Health Institute, Google Cloud is releasing the COVID-19 Public Forecasts to serve as an additional resource for first responders in healthcare, the public sector, and other impacted organizations preparing for what lies ahead. These forecasts are available for free and provide a projection of COVID-19 cases, deaths, and other metrics over the next 14 days for US counties and states. For more info, see https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-is-releasing-the-covid-19-public-forecasts and https://storage.googleapis.com/covid-external/COVID-19ForecastWhitePaper.pdf

    Content

    A projection of COVID-19 cases, deaths, and other metrics over the next 14 days for US counties and states

    Acknowledgements

    Released on BigQuery by Google Cloud:

    https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-is-releasing-the-covid-19-public-forecasts

    https://pantheon.corp.google.com/marketplace/product/bigquery-public-datasets/covid19-public-forecasts

  5. Google Patents Public Data

    • kaggle.com
    zip
    Updated Sep 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/datasets/bigquery/patents
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2018
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

    Content

    The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

    For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

    “Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

    Banner photo by Helloquence on Unsplash

  6. Google Cloud Release Notes

    • console.cloud.google.com
    Updated May 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=fr (2023). Google Cloud Release Notes [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google_cloud_release_notes?hl=fr
    Explore at:
    Dataset updated
    May 9, 2023
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    This table contains release notes for the majority of generally available Google Cloud products found on cloud.google.com . You can use this BigQuery public dataset to consume release notes programmatically across all products. HTML versions of release notes are available within each product's documentation and also in a filterable format at https://console.cloud.google.com/release-notes . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  7. Looker Ecommerce BigQuery Dataset

    • kaggle.com
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mustafa Keser
    Description

    Looker Ecommerce Dataset Description

    CSV version of Looker Ecommerce Dataset.

    Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

    1. distribution_centers.csv

    • Columns:
      • id: Unique identifier for each distribution center.
      • name: Name of the distribution center.
      • latitude: Latitude coordinate of the distribution center.
      • longitude: Longitude coordinate of the distribution center.

    2. events.csv

    • Columns:
      • id: Unique identifier for each event.
      • user_id: Identifier for the user associated with the event.
      • sequence_number: Sequence number of the event.
      • session_id: Identifier for the session during which the event occurred.
      • created_at: Timestamp indicating when the event took place.
      • ip_address: IP address from which the event originated.
      • city: City where the event occurred.
      • state: State where the event occurred.
      • postal_code: Postal code of the event location.
      • browser: Web browser used during the event.
      • traffic_source: Source of the traffic leading to the event.
      • uri: Uniform Resource Identifier associated with the event.
      • event_type: Type of event recorded.

    3. inventory_items.csv

    • Columns:
      • id: Unique identifier for each inventory item.
      • product_id: Identifier for the associated product.
      • created_at: Timestamp indicating when the inventory item was created.
      • sold_at: Timestamp indicating when the item was sold.
      • cost: Cost of the inventory item.
      • product_category: Category of the associated product.
      • product_name: Name of the associated product.
      • product_brand: Brand of the associated product.
      • product_retail_price: Retail price of the associated product.
      • product_department: Department to which the product belongs.
      • product_sku: Stock Keeping Unit (SKU) of the product.
      • product_distribution_center_id: Identifier for the distribution center associated with the product.

    4. order_items.csv

    • Columns:
      • id: Unique identifier for each order item.
      • order_id: Identifier for the associated order.
      • user_id: Identifier for the user who placed the order.
      • product_id: Identifier for the associated product.
      • inventory_item_id: Identifier for the associated inventory item.
      • status: Status of the order item.
      • created_at: Timestamp indicating when the order item was created.
      • shipped_at: Timestamp indicating when the order item was shipped.
      • delivered_at: Timestamp indicating when the order item was delivered.
      • returned_at: Timestamp indicating when the order item was returned.

    5. orders.csv

    • Columns:
      • order_id: Unique identifier for each order.
      • user_id: Identifier for the user who placed the order.
      • status: Status of the order.
      • gender: Gender information of the user.
      • created_at: Timestamp indicating when the order was created.
      • returned_at: Timestamp indicating when the order was returned.
      • shipped_at: Timestamp indicating when the order was shipped.
      • delivered_at: Timestamp indicating when the order was delivered.
      • num_of_item: Number of items in the order.

    6. products.csv

    • Columns:
      • id: Unique identifier for each product.
      • cost: Cost of the product.
      • category: Category to which the product belongs.
      • name: Name of the product.
      • brand: Brand of the product.
      • retail_price: Retail price of the product.
      • department: Department to which the product belongs.
      • sku: Stock Keeping Unit (SKU) of the product.
      • distribution_center_id: Identifier for the distribution center associated with the product.

    7. users.csv

    • Columns:
      • id: Unique identifier for each user.
      • first_name: First name of the user.
      • last_name: Last name of the user.
      • email: Email address of the user.
      • age: Age of the user.
      • gender: Gender of the user.
      • state: State where t...
  8. Stack Overflow Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stack Overflow (2019). Stack Overflow Data [Dataset]. https://www.kaggle.com/datasets/stackoverflow/stackoverflow
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    Stack Overflowhttp://stackoverflow.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Context

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.

    Content

    Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    Dataset Source: https://archive.org/download/stackexchange

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow

    https://cloud.google.com/bigquery/public-data/stackoverflow

    Banner Photo by Caspar Rubin from Unplash.

    Inspiration

    What is the percentage of questions that have been answered over the years?

    What is the reputation and badge count of users across different tenures on StackOverflow?

    What are 10 of the “easier” gold badges to earn?

    Which day of the week has most questions answered within an hour?

  9. The Met Public Domain Art Works

    • console.cloud.google.com
    Updated Nov 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:The%20Met&hl=de (2023). The Met Public Domain Art Works [Dataset]. https://console.cloud.google.com/marketplace/product/the-metropolitan-museum-of-art/the-met-public-domain-art-works?hl=de&jsmode
    Explore at:
    Dataset updated
    Nov 5, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    The Metropolitan Museum of Art, better known as the Met, provides a public domain dataset with over 200,000 objects including metadata and images. In early 2017, the Met debuted their Open Access policy to make part of their collection freely available for unrestricted use under the Creative Commons Zero designation and their own terms and conditions. This dataset provides a new view to one of the world’s premier collections of fine art. The data includes both image in Google Cloud Storage, and associated structured data in two BigQuery two tables, objects and images (1:N). Locations to images on both The Met’s website and in Google Cloud Storage are available in the BigQuery table. The meta data for this public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . The image data for this public dataset is hosted in Google Cloud Storage and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.

  10. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    Canada, Isle of Man, Moldova (Republic of), Tunisia, British Indian Ocean Territory, Bangladesh, Andorra, Taiwan, Nepal, Northern Mariana Islands
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  11. OpenStreetMap Public Dataset

    • console.cloud.google.com
    Updated Apr 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:OpenStreetMap&hl=de (2023). OpenStreetMap Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/openstreetmap/geo-openstreetmap?hl=de
    Explore at:
    Dataset updated
    Apr 23, 2023
    Dataset provided by
    OpenStreetMap//www.openstreetmap.org/
    Googlehttp://google.com/
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Adapted from Wikipedia: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Created in 2004, it was inspired by the success of Wikipedia and more than two million registered users who can add data by manual survey, GPS devices, aerial photography, and other free sources. We've made available a number of tables (explained in detail below): history_* tables: full history of OSM objects planet_* tables: snapshot of current OSM objects as of Nov 2019 The history_* and planet_* table groups are composed of node, way, relation, and changeset tables. These contain the primary OSM data types and an additional changeset corresponding to OSM edits for convenient access. These objects are encoded using the BigQuery GEOGRAPHY data type so that they can be operated upon with the built-in geography functions to perform geometry and feature selection, additional processing. Example analyses are given below. This dataset is part of a larger effort to make data available in BigQuery through the Google Cloud Public Datasets program . OSM itself is produced as a public good by volunteers, and there are no guarantees about data quality. Interested in learning more about how these data were brought into BigQuery and how you can use them? Check out the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  12. NYC Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    NYC Open Data
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

    Content

    Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

    • Over 8 million 311 service requests from 2012-2016

    • More than 1 million motor vehicle collisions 2012-present

    • Citi Bike stations and 30 million Citi Bike trips 2013-present

    • Over 1 billion Yellow and Green Taxi rides from 2009-present

    • Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://opendata.cityofnewyork.us/

    https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

    The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

    Banner Photo by @bicadmedia from Unplash.

    Inspiration

    On which New York City streets are you most likely to find a loud party?

    Can you find the Virginia Pines in New York City?

    Where was the only collision caused by an animal that injured a cyclist?

    What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

    https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png

  13. About COVID-19 Public Datasets

    • console.cloud.google.com
    Updated Jun 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ko (2022). About COVID-19 Public Datasets [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-public-data-program?hl=ko
    Explore at:
    Dataset updated
    Jun 19, 2022
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    In an effort to help combat COVID-19, we created a COVID-19 Public Datasets program to make data more accessible to researchers, data scientists and analysts. The program will host a repository of public datasets that relate to the COVID-19 crisis and make them free to access and analyze. These include datasets from the New York Times, European Centre for Disease Prevention and Control, Google, Global Health Data from the World Bank, and OpenStreetMap. Free hosting and queries of COVID datasets As with all data in the Google Cloud Public Datasets Program , Google pays for storage of datasets in the program. BigQuery also provides free queries over certain COVID-related datasets to support the response to COVID-19. Queries on COVID datasets will not count against the BigQuery sandbox free tier , where you can query up to 1TB free each month. Limitations and duration Queries of COVID data are free. If, during your analysis, you join COVID datasets with non-COVID datasets, the bytes processed in the non-COVID datasets will be counted against the free tier, then charged accordingly, to prevent abuse. Queries of COVID datasets will remain free until Sept 15, 2021. The contents of these datasets are provided to the public strictly for educational and research purposes only. We are not onboarding or managing PHI or PII data as part of the COVID-19 Public Dataset Program. Google has practices & policies in place to ensure that data is handled in accordance with widely recognized patient privacy and data security policies. See the list of all datasets included in the program

  14. Data from: arXiv Dataset

    • kaggle.com
    • huggingface.co
    • +1more
    zip
    Updated Nov 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cornell University (2020). arXiv Dataset [Dataset]. https://www.kaggle.com/Cornell-University/arxiv
    Explore at:
    zip(950178574 bytes)Available download formats
    Dataset updated
    Nov 22, 2020
    Dataset authored and provided by
    Cornell University
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About ArXiv

    For nearly 30 years, ArXiv has served the public and research communities by providing open access to scholarly articles, from the vast branches of physics to the many subdisciplines of computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics. This rich corpus of information offers significant, but sometimes overwhelming depth.

    In these times of unique global challenges, efficient extraction of insights from data is essential. To help make the arXiv more accessible, we present a free, open pipeline on Kaggle to the machine-readable arXiv dataset: a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more.

    Our hope is to empower new use cases that can lead to the exploration of richer machine learning techniques that combine multi-modal features towards applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction and semantic search interfaces.

    The dataset is freely available via Google Cloud Storage buckets (more info here). Stay tuned for weekly updates to the dataset!

    ArXiv is a collaboratively funded, community-supported resource founded by Paul Ginsparg in 1991 and maintained and operated by Cornell University.

    The release of this dataset was featured further in a Kaggle blog post here.

    https://storage.googleapis.com/kaggle-public-downloads/arXiv.JPG" alt="">

    See here for more information.

    ArXiv On Kaggle

    Metadata

    This dataset is a mirror of the original ArXiv data. Because the full dataset is rather large (1.1TB and growing), this dataset provides only a metadata file in the json format. This file contains an entry for each paper, containing: - id: ArXiv ID (can be used to access the paper, see below) - submitter: Who submitted the paper - authors: Authors of the paper - title: Title of the paper - comments: Additional info, such as number of pages and figures - journal-ref: Information about the journal the paper was published in - doi: https://www.doi.org - abstract: The abstract of the paper - categories: Categories / tags in the ArXiv system - versions: A version history

    You can access each paper directly on ArXiv using these links: - https://arxiv.org/abs/{id}: Page for this paper including its abstract and further links - https://arxiv.org/pdf/{id}: Direct link to download the PDF

    Bulk access

    The full set of PDFs is available for free in the GCS bucket gs://arxiv-dataset or through Google API (json documentation and xml documentation).

    You can use for example gsutil to download the data to your local machine. ```

    List files:

    gsutil cp gs://arxiv-dataset/arxiv/

    Download pdfs from March 2020:

    gsutil cp gs://arxiv-dataset/arxiv/arxiv/pdf/2003/ ./a_local_directory/

    Download all the source files

    gsutil cp -r gs://arxiv-dataset/arxiv/ ./a_local_directory/ ```

    Update Frequency

    We're automatically updating the metadata as well as the GCS bucket on a weekly basis.

    License

    Creative Commons CC0 1.0 Universal Public Domain Dedication applies to the metadata in this dataset. See https://arxiv.org/help/license for further details and licensing on individual papers.

    Acknowledgements

    The original data is maintained by ArXiv, huge thanks to the team for building and maintaining this dataset.

    We're using https://github.com/mattbierbaum/arxiv-public-datasets to pull the original data, thanks to Matt Bierbaum for providing this tool.

  15. geo-openstreetmap

    • kaggle.com
    zip
    Updated Apr 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). geo-openstreetmap [Dataset]. https://www.kaggle.com/bigquery/geo-openstreetmap
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 17, 2020
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    Adapted from Wikipedia: OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. Created in 2004, it was inspired by the success of Wikipedia and more than two million registered users who can add data by manual survey, GPS devices, aerial photography, and other free sources.

    To aid researchers, data scientists, and analysts in the effort to combat COVID-19, Google is making a hosted repository of public datasets including OpenStreetMap data, free to access. To facilitate the Kaggle community to access the BigQuery dataset, it is onboarded to Kaggle platform which allows querying it without a linked GCP account. Please note that due to the large size of the dataset, Kaggle applies a quota of 5 TB of data scanned per user per 30-days.

    Content

    This is the OpenStreetMap (OSM) planet-wide dataset loaded to BigQuery.

    Tables: - history_* tables: full history of OSM objects. - planet_* tables: snapshot of current OSM objects as of Nov 2019.

    The history_* and planet_* table groups are composed of node, way, relation, and changeset tables. These contain the primary OSM data types and an additional changeset corresponding to OSM edits for convenient access. These objects are encoded using the BigQuery GEOGRAPHY data type so that they can be operated upon with the built-in geography functions to perform geometry and feature selection, additional processing.

    Resources

    You can read more about OSM elements on the OSM Wiki. This dataset uses BigQuery GEOGRAPHY datatype which supports a set of functions that can be used to analyze geographical data, determine spatial relationships between geographical features, and construct or manipulate GEOGRAPHYs.

  16. Software Architecture Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Rian (2025). Software Architecture Dataset [Dataset]. https://www.kaggle.com/datasets/carlosrian/software-architecture-dataset
    Explore at:
    zip(32791767390 bytes)Available download formats
    Dataset updated
    Jun 25, 2025
    Authors
    Carlos Rian
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Dataset Overview

    This dataset contains augmented software architecture diagrams specifically focused on cloud service components from major cloud providers (AWS, Azure, GCP). The dataset was developed as part of a FIAP POS Tech Hackathon project for training computer vision models to detect and classify cloud architecture components in technical diagrams.

    Dataset Contents

    • 8k+ augmented images (PNG format) with corresponding Pascal VOC XML annotations
    • Cloud service components from AWS, Azure, and GCP platforms
    • 87 unique cloud service types including:
      • AWS services: EC2, S3, RDS, DynamoDB, Lambda, API Gateway, CloudFront, etc.
      • Azure services: Databricks, Storage, Managed Database, Load Balancer, etc.
      • GCP services: (various Google Cloud Platform components)

    Data Augmentation Process

    The dataset was created using a sophisticated augmentation pipeline that applies multiple transformations while preserving bounding box annotations:

    Augmentation Techniques Applied:

    • Brightness and contrast adjustments (±20%)
    • Rotation (±15 degrees)
    • Gaussian blur (sigma: 0.0-2.0)
    • Gaussian noise (variance: 10-50)
    • Elastic transformations (alpha: 1, sigma: 50)
    • Grid distortions (num_steps: 5, distort_limit: 0.3)
    • Optical distortions (distort_limit: 0.1, shift_limit: 0.1)

    Quality Control:

    • Relevance filtering: Only "High" relevance components were selected for augmentation
    • Bounding box preservation: All transformations maintain accurate object detection annotations
    • Format compatibility: Pascal VOC XML format for seamless integration with popular ML frameworks

    Technical Specifications

    • Image Format: PNG
    • Annotation Format: Pascal VOC XML
    • Augmentation Ratio: 10 variations per original image
    • Image Categories: Cloud architecture components and services
    • Bounding Box Accuracy: Preserved through geometric transformations

    Use Cases

    This dataset is ideal for: - Object detection in software architecture diagrams - Cloud service recognition in technical documentation - Automated diagram analysis tools - Computer vision research in technical domain - Training custom models for architecture diagram parsing

    Dataset Structure

    dataset_augmented/
    ├── image_xpto.png   # Augmented PNG images
    ├── image_xpto.xml   # Pascal VOC XML files
    

    Machine Learning Applications

    Perfect for training: - YOLO object detection models - Faster R-CNN for precise component detection - Custom CNN architectures for diagram analysis - Multi-class classification models

    Quality Assurance

    • All images maintain visual quality after augmentation
    • Bounding boxes are accurately transformed with image modifications
    • Consistent naming convention for easy batch processing
    • Validated XML structure for error-free training
  17. Learning Path Index Dataset

    • kaggle.com
    zip
    Updated Nov 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mani Sarkar (2024). Learning Path Index Dataset [Dataset]. https://www.kaggle.com/datasets/neomatrix369/learning-path-index-dataset/code
    Explore at:
    zip(151846 bytes)Available download formats
    Dataset updated
    Nov 6, 2024
    Authors
    Mani Sarkar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description

    The Learning Path Index Dataset is a comprehensive collection of byte-sized courses and learning materials tailored for individuals eager to delve into the fields of Data Science, Machine Learning, and Artificial Intelligence (AI), making it an indispensable reference for students, professionals, and educators in the Data Science and AI communities.

    This Kaggle Dataset along with the KaggleX Learning Path Index GitHub Repo were created by the mentors and mentees of Cohort 3 KaggleX BIPOC Mentorship Program (between August 2023 and November 2023, also see this). See Credits section at the bottom of the long description.

    Inspiration

    This dataset was created out of a commitment to facilitate learning and growth within the Data Science, Machine Learning, and AI communities. It started off as an idea at the end of Cohort 2 of the KaggleX BIPOC Mentorship Program brainstorming and feedback session. It was one of the ideas to create byte-sized learning material to help our KaggleX mentees learn things faster. It aspires to simplify the process of finding, evaluating, and selecting the most fitting educational resources.

    Context

    This dataset was meticulously curated to assist learners in navigating the vast landscape of Data Science, Machine Learning, and AI education. It serves as a compass for those aiming to develop their skills and expertise in these rapidly evolving fields.

    The mentors and mentees communicated via Discord, Trello, Google Hangout, etc... to put together these artifacts and made them public for everyone to use and contribute back.

    Sources

    The dataset compiles data from a curated selection of reputable sources including leading educational platforms such as Google Developer, Google Cloud Skill Boost, IBM, Fast AI, etc. By drawing from these trusted sources, we ensure that the data is both accurate and pertinent. The raw data and other artifacts as a result of this exercise can be found on the GitHub Repo i.e. KaggleX Learning Path Index GitHub Repo.

    Content

    The dataset encompasses the following attributes:

    • Course / Learning Material: The title of the Data Science, Machine Learning, or AI course or learning material.
    • Source: The provider or institution offering the course.
    • Course Level: The proficiency level, ranging from Beginner to Advanced.
    • Type (Free or Paid): Indicates whether the course is available for free or requires payment.
    • Module: Specific module or section within the course.
    • Duration: The estimated time required to complete the module or course.
    • Module / Sub-module Difficulty Level: The complexity level of the module or sub-module.
    • Keywords / Tags / Skills / Interests / Categories: Relevant keywords, tags, or categories associated with the course with a focus on Data Science, Machine Learning, and AI.
    • Links: Hyperlinks to access the course or learning material directly.

    How to contribute to this initiative?

    • You can also join us by taking part in the next KaggleX BIPOC Mentorship program (also see this)
    • Keep your eyes open on the Kaggle Discussions page and other KaggleX social media channels. Or find us on the Kaggle Discord channel to learn more about the next steps
    • Create notebooks from this data
    • Create supplementary or complementary data for or from this dataset
    • Submit corrections/enhancements or anything else to help improve this dataset so it has a wider use and purpose

    License

    The Learning Path Index Dataset is openly shared under a permissive license, allowing users to utilize the data for educational, analytical, and research purposes within the Data Science, Machine Learning, and AI domains. Feel free to fork the dataset and make it your own, we would be delighted if you contributed back to the dataset and/or our KaggleX Learning Path Index GitHub Repo as well.

    Important Links

    Credits

    Credits for all the work done to create this Kaggle Dataset and the KaggleX [Learnin...

  18. Ethereum Transaction 20220901 to 20220911 (Repeat)

    • kaggle.com
    zip
    Updated Nov 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CJJ (2025). Ethereum Transaction 20220901 to 20220911 (Repeat) [Dataset]. https://www.kaggle.com/datasets/migrationxian/eth-tx-220901-220911
    Explore at:
    zip(2393150202 bytes)Available download formats
    Dataset updated
    Nov 6, 2025
    Authors
    CJJ
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Ethereum Transaction Data

    This dataset contains Ethereum transaction data from September 1, 2022 to September 11, 2022.

    Data Source

    This data is downloaded directly from Google BigQuery - Ethereum, a public dataset that provides comprehensive blockchain data.

    License

    Free and Open Source: This dataset is released under the CC0-1.0 (Creative Commons Zero) license, meaning it is in the public domain and free to use for any purpose without restrictions.

    Content

    The dataset includes detailed transaction information from the Ethereum blockchain during the specified time period, including transaction hashes, addresses, values, gas fees, and timestamps.

    File Information

    • ethtx_220901_220911_*: Ethereum transaction data files split into 41 shards (~9.6GB total)

    Time Range

    • Start: 2022-09-01
    • End: 2022-09-11

    Use Cases

    • Blockchain analysis
    • Transaction pattern research
    • Network behavior studies
    • Time series analysis
    • Academic research
    • Data science projects
  19. Global overview of cloud-, snow-, and shade-free Landsat (1982-2024) and...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katarzyna Ewa Lewińska; Stefan Ernst; David Frantz; Ulf Leser; Patrick Hostert (2025). Global overview of cloud-, snow-, and shade-free Landsat (1982-2024) and Sentinel-2 (2015-2024) data [Dataset]. http://doi.org/10.5061/dryad.gb5mkkwxm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Humboldt-Universität zu Berlin
    Trier University of Applied Sciences
    Authors
    Katarzyna Ewa Lewińska; Stefan Ernst; David Frantz; Ulf Leser; Patrick Hostert
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Landsat and Sentinel-2 acquisitions are among the most frequently used medium-resolution (i.e., 10-30 m) optical data. The data are extensively used in terrestrial vegetation applications, including but not limited to, land cover and land use mapping, vegetation condition and phenology monitoring, and disturbance and change mapping. While the Landsat archives alone provide over 40 years, and counting, of continuous and consistent observations, since mid-2015 Sentinel-2 has enabled a revisit frequency of up to 2-days. Although the spatio-temporal availability of both data archives is well-known at the scene level, information on the actual availability of usable (i.e., cloud-, snow-, and shade-free) observations at the pixel level needs to be explored for each study to ensure correct parametrization of used algorithms, thus robustness of subsequent analyses. However, a priori data exploration is time and resource‑consuming, thus is rarely performed. As a result, the spatio-temporal heterogeneity of usable data is often inadequately accounted for in the analysis design, risking ill-advised selection of algorithms and hypotheses, and thus inferior quality of final results. Here we present a global dataset comprising precomputed daily availability of usable Landsat and Sentinel-2 data sampled at a pixel-level in a regular 0.18°-point grid. We based the dataset on the complete 1982-2024 Landsat surface reflectance data (Collection 2) and 2015-2024 Seninel-2 top-of-the-atmosphere reflectance scenes (pre‑Collection-1 and Collection-1). Derivation of cloud-, snow-, and shade-free observations followed the methodology developed in our recent study on data availability over Europe (Lewińska et al., 2023; https://doi.org/10.20944/preprints202308.2174.v2). Furthermore, we expanded the dataset with growing season information derived based on the 2001‑2019 time series of the yearly 500 m MODIS land cover dynamics product (MCD12Q2; Collection 6). As such, our dataset presents a unique overview of the spatio-temporal availability of usable daily Landsat and Sentinel-2 data at the global scale, hence offering much-needed a priori information aiding the identification of appropriate methods and challenges for terrestrial vegetation analyses at the local to global scales. The dataset can be viewed using the dedicated GEE App (link in Related Works). As of February 2025 the dataset has been extended with the 2024 data. Methods We based our analyses on freely and openly accessible Landsat and Sentinel-2 data archives available in Google Earth Engine (Gorelick et al., 2017). We used all Landsat surface reflectance Level 2, Tier 1, Collection 2 scenes acquired with the Thematic Mapper (TM) (Earth Resources Observation And Science (EROS) Center, 1982), Enhanced Thematic Mapper (ETM+) (Earth Resources Observation And Science (EROS) Center, 1999), and Operational Land Imager (OLI) (Earth Resources Observation And Science (EROS) Center, 2013) scanners between 22nd August 1982 and 31st December 2024, and Sentinel-2 TOA reflectance Level-1C scenes (pre‑Collection-1 (European Space Agency, 2015, 2021) and Collection-1 (European Space Agency, 2022)) acquired with the MultiSpectral Instrument (MSI) between 23rd June 2015 and 31st December 2024. We implemented a conservative pixel-quality screening to identify cloud-, snow-, and shade-free land pixels. For the Landsat time series, we relied on the inherent pixel quality bands (Foga et al., 2017; Zhu & Woodcock, 2012) excluding all pixels flagged as cloud, snow, or shadow as well as pixels with the fill-in value of 20,000 (scale factor 0.0001; (Zhang et al., 2022)). Furthermore, due to the Landsat 7 orbit drift (Qiu et al., 2021) we excluded all ETM+ scenes acquired after 31st December 2020. Because Sentinel-2 Level-2A quality masks lack the desired scope and accuracy (Baetens et al., 2019; Coluzzi et al., 2018), we resorted to Level-1C scenes accompanied by the supporting Cloud Probability product. Furthermore, we employed a selection of conditions, including a threshold on Band 10 (SWIR-Cirrus), which is not available at Level‑2A. Overall, our Sentinel-2-specific cloud, shadow, and snow screening comprised:

    exclusion of all pixels flagged as clouds and cirrus in the inherent ‘QA60’ cloud mask band; exclusion of all pixels with cloud probability >50% as defined in the corresponding Cloud Probability product available for each scene; exclusion of cirrus clouds (B10 reflectance >0.01); exclusion of clouds based on Cloud Displacement Analysis (CDI<‑0.5) (Frantz et al., 2018); exclusion of dark pixels (B8 reflectance <0.16) within cloud shadows modelled for each scene with scene‑specific sun parameters for the clouds identified in the previous steps. Here we assumed a cloud height of 2,000 m. exclusion of pixels within a 40-m buffer (two pixels at 20-m resolution) around each identified cloud and cloud shadow object. exclusion of snow pixels identified with a snow mask branch of the Sen2Cor processor (Main-Knorn et al., 2017).

    Through applying the data screening, we generated a collection of daily availability records for Landsat and Sentinel-2 data archives. We next subsampled the resulting binary time series with a regular 0.18° x 0.18°‑point grid defined in the EPSG:4326 projection, obtaining 475,150 points located over land between ‑179.8867°W and 179.5733°E and 83.50834°N and ‑59.05167°S. Owing to the substantial amount of data comprised in the Landsat and Sentinel-2 archives and the computationally demanding process of cloud-, snow-, and shade-screening, we performed the subsampling in batches corresponding to a 4° x 4° regular grid and consolidated the final data in post-processing. We derived the pixel-specific growing season information from the 2001-2019 time series of the yearly 500‑m MODIS land cover dynamics product (MCD12Q2; Collection 6) available in Google Earth Engine. We only used information on the start and the end of a growing season, excluding all pixels with quality below ‘best’. When a pixel went through more than one growing cycle per year, we approximated a growing season as the period between the beginning of the first growing cycle and the end of the last growing cycle. To fill in data gaps arising from low-quality data and insufficiently pronounced seasonality (Friedl et al., 2019), we used a 5x5 mean moving window filter to ensure better spatial continuity of our growing season datasets. Following (Lewińska et al., 2023), we defined the start of the season as the pixel-specific 25th percentile of the 2001-2019 distribution for the start of the season dates, and the end of the season as the pixel-specific 75th percentile of the 2001-2019 distribution for end of the season dates. Finally, we subsampled the start and end of the season datasets with the same regular 0.18° x 0.18°-point grid defined in the EPSG:4326 projection. References:

    Baetens, L., Desjardins, C., & Hagolle, O. (2019). Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. Remote Sensing, 11(4), 433. https://doi.org/10.3390/rs11040433 Coluzzi, R., Imbrenda, V., Lanfredi, M., & Simoniello, T. (2018). A first assessment of the Sentinel-2 Level 1-C cloud mask product to support informed surface analyses. Remote Sensing of Environment, 217, 426–443. https://doi.org/10.1016/j.rse.2018.08.009 Earth Resources Observation And Science (EROS) Center. (1982). Collection-2 Landsat 4-5 Thematic Mapper (TM) Level-1 Data Products [Other]. U.S. Geological Survey. https://doi.org/10.5066/P918ROHC Earth Resources Observation And Science (EROS) Center. (1999). Collection-2 Landsat 7 Enhanced Thematic Mapper Plus (ETM+) Level-1 Data Products [dataset]. U.S. Geological Survey. https://doi.org/10.5066/P9TU80IG Earth Resources Observation And Science (EROS) Center. (2013). Collection-2 Landsat 8-9 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor) Level-1 Data Products [Other]. U.S. Geological Survey. https://doi.org/10.5066/P975CC9B European Space Agency. (2015). Sentinel-2 MSI Level-1C TOA Reflectance [dataset]. European Space Agency. https://doi.org/10.5270/S2_-d8we2fl European Space Agency. (2021). Sentinel-2 MSI Level-1C TOA Reflectance, Collection 0 [dataset]. European Space Agency. https://doi.org/10.5270/S2_-d8we2fl European Space Agency. (2022). Sentinel-2 MSI Level-1C TOA Reflectance [dataset]. European Space Agency. https://doi.org/10.5270/S2_-742ikth Foga, S., Scaramuzza, P. L., Guo, S., Zhu, Z., Dilley, R. D., Beckmann, T., Schmidt, G. L., Dwyer, J. L., Joseph Hughes, M., & Laue, B. (2017). Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sensing of Environment, 194, 379–390. https://doi.org/10.1016/j.rse.2017.03.026 Frantz, D., Haß, E., Uhl, A., Stoffels, J., & Hill, J. (2018). Improvement of the Fmask algorithm for Sentinel-2 images: Separating clouds from bright surfaces based on parallax effects. Remote Sensing of Environment, 215, 471–481. https://doi.org/10.1016/j.rse.2018.04.046 Friedl, M., Josh, G., & Sulla-Menashe, D. (2019). MCD12Q2 MODIS/Terra+Aqua Land Cover Dynamics Yearly L3 Global 500m SIN Grid V006 [dataset]. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/MODIS/MCD12Q2.006 Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. https://doi.org/10.1016/j.rse.2017.06.031Lewińska K.E., Ernst S., Frantz D., Leser U., Hostert P., Global Overview of Usable Landsat and Sentinel-2 Data for 1982–2023. Data in Brief 57, (2024) https://doi.org/10.1016/j.dib.2024.111054 Main-Knorn, M., Pflug, B., Louis, J., Debaecker, V., Müller-Wilm, U., & Gascon, F. (2017). Sen2Cor for Sentinel-2. In L. Bruzzone, F. Bovolo,

  20. d

    Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment &...

    • datarade.ai
    .json, .csv
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataplex (2025). Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights [Dataset]. https://datarade.ai/data-products/dataplex-google-reviews-ratings-dataset-track-consumer-s-dataplex
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Dataplex
    Area covered
    United States
    Description

    The Google Reviews & Ratings Dataset provides businesses with structured insights into customer sentiment, satisfaction, and trends based on reviews from Google. Unlike broad review datasets, this product is location-specific—businesses provide the locations they want to track, and we retrieve as much historical data as possible, with daily updates moving forward.

    This dataset enables businesses to monitor brand reputation, analyze consumer feedback, and enhance decision-making with real-world insights. For deeper analysis, optional AI-driven sentiment analysis and review summaries are available on a weekly, monthly, or yearly basis.

    Dataset Highlights

    • Location-Specific Reviews – Reviews and ratings for the locations you provide.
    • Daily Updates – New reviews and rating changes updated automatically.
    • Historical Data Access – Retrieve past reviews where available.
    • AI Sentiment Analysis (Optional) – Summarized insights by week, month, or year.
    • Competitive Benchmarking – Compare performance across selected locations.

    Use Cases

    • Franchise & Retail Chains – Monitor brand reputation and performance across locations.
    • Hospitality & Restaurants – Track guest sentiment and service trends.
    • Healthcare & Medical Facilities – Understand patient feedback for specific locations.
    • Real Estate & Property Management – Analyze tenant and customer experiences through reviews.
    • Market Research & Consumer Insights – Identify trends and analyze feedback patterns across industries.

    Data Updates & Delivery

    • Update Frequency: Daily
    • Data Format: CSV for easy integration
    • Delivery: Secure file transfer (SFTP or cloud storage)

    Data Fields Include:

    • Business Name
    • Location Details
    • Star Ratings
    • Review Text
    • Timestamps
    • Reviewer Metadata

    Optional Add-Ons:

    • AI Sentiment Analysis – Aggregate trends by week, month, or year.
    • Custom Location Tracking – Tailor the dataset to fit your specific business needs.

    Ideal for

    • Marketing Teams – Leverage real-world consumer feedback to optimize brand strategy.
    • Business Analysts – Use structured review data to track customer sentiment over time.
    • Operations & Customer Experience Teams – Identify service issues and opportunities for improvement.
    • Competitive Intelligence – Compare locations and benchmark against industry competitors.

    Why Choose This Dataset?

    • Accurate & Up-to-Date – Daily updates ensure fresh, reliable data.
    • Scalable & Customizable – Track only the locations that matter to you.
    • Actionable Insights – AI-driven summaries for quick decision-making.
    • Easy Integration – Delivered in a structured format for seamless analysis.

    By leveraging Google Reviews & Ratings Data, businesses can gain valuable insights into customer sentiment, enhance reputation management, and stay ahead of the competition.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
Organization logoOrganization logo

Google Ads Transparency Center

Explore at:
14 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 6, 2023
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description

This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Search
Clear search
Close search
Google apps
Main menu