100+ datasets found
  1. d

    Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment &...

    • datarade.ai
    .json, .csv
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataplex (2025). Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights [Dataset]. https://datarade.ai/data-products/dataplex-google-reviews-ratings-dataset-track-consumer-s-dataplex
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Dataplex
    Area covered
    Grenada, Guinea, Palau, British Indian Ocean Territory, Ethiopia, South Georgia and the South Sandwich Islands, Korea (Democratic People's Republic of), Bhutan, Sweden, French Polynesia
    Description

    The Google Reviews & Ratings Dataset provides businesses with structured insights into customer sentiment, satisfaction, and trends based on reviews from Google. Unlike broad review datasets, this product is location-specific—businesses provide the locations they want to track, and we retrieve as much historical data as possible, with daily updates moving forward.

    This dataset enables businesses to monitor brand reputation, analyze consumer feedback, and enhance decision-making with real-world insights. For deeper analysis, optional AI-driven sentiment analysis and review summaries are available on a weekly, monthly, or yearly basis.

    Dataset Highlights

    • Location-Specific Reviews – Reviews and ratings for the locations you provide.
    • Daily Updates – New reviews and rating changes updated automatically.
    • Historical Data Access – Retrieve past reviews where available.
    • AI Sentiment Analysis (Optional) – Summarized insights by week, month, or year.
    • Competitive Benchmarking – Compare performance across selected locations.

    Use Cases

    • Franchise & Retail Chains – Monitor brand reputation and performance across locations.
    • Hospitality & Restaurants – Track guest sentiment and service trends.
    • Healthcare & Medical Facilities – Understand patient feedback for specific locations.
    • Real Estate & Property Management – Analyze tenant and customer experiences through reviews.
    • Market Research & Consumer Insights – Identify trends and analyze feedback patterns across industries.

    Data Updates & Delivery

    • Update Frequency: Daily
    • Data Format: CSV for easy integration
    • Delivery: Secure file transfer (SFTP or cloud storage)

    Data Fields Include:

    • Business Name
    • Location Details
    • Star Ratings
    • Review Text
    • Timestamps
    • Reviewer Metadata

    Optional Add-Ons:

    • AI Sentiment Analysis – Aggregate trends by week, month, or year.
    • Custom Location Tracking – Tailor the dataset to fit your specific business needs.

    Ideal for

    • Marketing Teams – Leverage real-world consumer feedback to optimize brand strategy.
    • Business Analysts – Use structured review data to track customer sentiment over time.
    • Operations & Customer Experience Teams – Identify service issues and opportunities for improvement.
    • Competitive Intelligence – Compare locations and benchmark against industry competitors.

    Why Choose This Dataset?

    • Accurate & Up-to-Date – Daily updates ensure fresh, reliable data.
    • Scalable & Customizable – Track only the locations that matter to you.
    • Actionable Insights – AI-driven summaries for quick decision-making.
    • Easy Integration – Delivered in a structured format for seamless analysis.

    By leveraging Google Reviews & Ratings Data, businesses can gain valuable insights into customer sentiment, enhance reputation management, and stay ahead of the competition.

  2. AI Financial Market Data

    • kaggle.com
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Science Lovers (2025). AI Financial Market Data [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/ai-financial-and-market-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data Science Lovers
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

    Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

    This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.

    This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

    This analyse will be helpful for those working in Finance or Share Market domain.

    From this dataset, we extract various insights using Python in our Project.

    1) How much amount the companies spent on R & D ?

    2) Revenue Earned by the companies

    3) Date-wise Impact on the Stock

    4) Events when Maximum Stock Impact was observed

    5) AI Revenue Growth of the companies

    6) Correlation between the columns

    7) Expenditure vs Revenue year-by-year

    8) Event Impact Analysis

    9) Change in the index wrt Year & Company

    These are the main Features/Columns available in the dataset :

    1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.

    2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".

    3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.

    4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.

    5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.

    6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.

    7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.

  3. Google Landmarks Dataset v2

    • github.com
    • opendatalab.com
    Updated Sep 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
    Explore at:
    Dataset updated
    Sep 27, 2019
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.

  4. COVID-19 Search Trends symptoms dataset

    • console.cloud.google.com
    Updated Jul 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=de&inv=1&invt=Ab4Bvg (2023). COVID-19 Search Trends symptoms dataset [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-search-trends?hl=de
    Explore at:
    Dataset updated
    Jul 8, 2023
    Dataset provided by
    Google Searchhttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Description

    The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for a broad set of health symptoms, signs, and conditions. The dataset provides a daily or weekly time series for each region showing the relative volume of searches for each symptom. This dataset is intended to help researchers to better understand the impact of COVID-19. It shouldn't be used for medical diagnostic, prognostic, or treatment purposes. It also isn't intended to be used for guidance on personal travel plans. To learn more about the dataset, how we generate it and preserve privacy, read the data documentation . To visualize the data, try exploring these interactive charts and map of symptom search trends . As of Dec. 15, 2020, the dataset was expanded to include trends for Australia, Ireland, New Zealand, Singapore, and the United Kingdom. This expanded data is available in new tables that provide data at country and two subregional levels. We will not be updating existing state/county tables going forward. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  5. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  6. A

    Data from: Google Earth Engine (GEE)

    • data.amerigeoss.org
    • sdgs.amerigeoss.org
    • +6more
    esri rest, html
    Updated Nov 28, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AmeriGEO ArcGIS (2018). Google Earth Engine (GEE) [Dataset]. https://data.amerigeoss.org/de/dataset/google-earth-engine-gee2
    Explore at:
    html, esri restAvailable download formats
    Dataset updated
    Nov 28, 2018
    Dataset provided by
    AmeriGEO ArcGIS
    Description

    Meet Earth Engine

    Google Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface.

    Satellite imagerySATELLITE IMAGERY+Your algorithmsYOUR ALGORITHMS+Causes you care aboutREAL WORLD APPLICATIONS
    LEARN MORE
    GLOBAL-SCALE INSIGHT

    Explore our interactive timelapse viewer to travel back in time and see how the world has changed over the past twenty-nine years. Timelapse is one example of how Earth Engine can help gain insight into petabyte-scale datasets.

    EXPLORE TIMELAPSE
    READY-TO-USE DATASETS

    The public data archive includes more than thirty years of historical imagery and scientific datasets, updated and expanded daily. It contains over twenty petabytes of geospatial data instantly available for analysis.

    EXPLORE DATASETS
    SIMPLE, YET POWERFUL API

    The Earth Engine API is available in Python and JavaScript, making it easy to harness the power of Google’s cloud for your own geospatial analysis.

    EXPLORE THE API
    Google Earth Engine has made it possible for the first time in history to rapidly and accurately process vast amounts of satellite imagery, identifying where and when tree cover change has occurred at high resolution. Global Forest Watch would not exist without it. For those who care about the future of the planet Google Earth Engine is a great blessing!-Dr. Andrew Steer, President and CEO of the World Resources Institute.
    CONVENIENT TOOLS

    Use our web-based code editor for fast, interactive algorithm development with instant access to petabytes of data.

    LEARN ABOUT THE CODE EDITOR
    SCIENTIFIC AND HUMANITARIAN IMPACT

    Scientists and non-profits use Earth Engine for remote sensing research, predicting disease outbreaks, natural resource management, and more.

    SEE CASE STUDIES
    READY TO BE PART OF THE SOLUTION?SIGN UP NOW
    TERMS OF SERVICE PRIVACY ABOUT GOOGLE

  7. Google's Audioset: Reformatted

    • zenodo.org
    • data.niaid.nih.gov
    tsv
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bakhtin; Bakhtin (2022). Google's Audioset: Reformatted [Dataset]. http://doi.org/10.5281/zenodo.7096702
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Sep 21, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bakhtin; Bakhtin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    Google's AudioSet consistently reformatted
    
    During my work with Google's AudioSet(https://research.google.com/audioset/index.html)
    I encountered some problems due to the fact that Weak (https://research.google.com/audioset/download.html) and
     Strong (https://research.google.com/audioset/download_strong.html) versions of the dataset used different csv formatting for the data, and that also labels used in the two datasets are different (https://github.com/audioset/ontology/issues/9) and also presented in files with different formatting.
    
    This dataset reformatting aims to unify the formats of the datasets so that it is possible
    to analyse them in the same pipelines, and also make the dataset files compatible
    with psds_eval, dcase_util and sed_eval Python packages used in Audio Processing.
    
    For better formatted documentation and source code of reformatting refer to https://github.com/bakhtos/GoogleAudioSetReformatted 
    
    -Changes in dataset
    
    All files are converted to tab-separated `*.tsv` files (i.e. `csv` files with `\t`
    as a separator). All files have a header as the first line.
    
    -New fields and filenames
    
    Fields are renamed according to the following table, to be compatible with psds_eval:
    
    Old field -> New field
    YTID -> filename
    segment_id -> filename
    start_seconds -> onset
    start_time_seconds -> onset
    end_seconds -> offset
    end_time_seconds -> offset
    positive_labels -> event_label
    label -> event_label
    present -> present
    
    For class label files, `id` is now the name for the for `mid` label (e.g. `/m/09xor`)
    and `label` for the human-readable label (e.g. `Speech`). Index of label indicated
    for Weak dataset labels (`index` field in `class_labels_indices.csv`) is not used.
    
    Files are renamed according to the following table to ensure consisted naming
    of the form `audioset_[weak|strong]_[train|eval]_[balanced|unbalanced|posneg]*.tsv`:
    
    Old name -> New name
    balanced_train_segments.csv -> audioset_weak_train_balanced.tsv
    unbalanced_train_segments.csv -> audioset_weak_train_unbalanced.tsv
    eval_segments.csv -> audioset_weak_eval.tsv
    audioset_train_strong.tsv -> audioset_strong_train.tsv
    audioset_eval_strong.tsv -> audioset_strong_eval.tsv
    audioset_eval_strong_framed_posneg.tsv -> audioset_strong_eval_posneg.tsv
    class_labels_indices.csv -> class_labels.tsv (merged with mid_to_display_name.tsv)
    mid_to_display_name.tsv -> class_labels.tsv (merged with class_labels_indices.csv)
    
    -Strong dataset changes
    
    Only changes to the Strong dataset are renaming of fields and reordering of columns,
    so that both Weak and Strong version have `filename` and `event_label` as first 
    two columns.
    
    -Weak dataset changes
    
    -- Labels are given one per line, instead of comma-separated and quoted list
    
    -- To make sure that `filename` format is the same as in Strong version, the following
    format change is made:
    The value of the `start_seconds` field is converted to milliseconds and appended to the `filename` with an underscore. Since all files in the dataset are assumed to be 10 seconds long, this unifies the format of `filename` with the Strong version and makes `end_seconds` also redundant.
    
    -Class labels changes
    
    Class labels from both datasets are merged into one file and given in alphabetical order of `id`s. Since same `id`s are present in both datasets, but sometimes with different human-readable labels, labels from Strong dataset overwrite those from Weak. It is possible to regenerate `class_labels.tsv` while giving priority to the Weak version of labels by calling `convert_labels(False)` from convert.py in the GitHub repository.
    
    -License
    
    Google's AudioSet was published in two stages - first the Weakly labelled data (Gemmeke, Jort F., et al. "Audio set: An ontology and human-labeled dataset for audio events." 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017.), then the strongly labelled data (Hershey, Shawn, et al. "The benefit of temporally-strong labels in audio event classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.)
    
    Both the original dataset and this reworked version are licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
    

    Class labels come from the AudioSet Ontology, which is licensed under CC BY-SA 4.0.

  8. Google Analytics Sample

    • console.cloud.google.com
    Updated Jul 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data
    Explore at:
    Dataset updated
    Jul 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  9. d

    Outscraper Google Maps Scraper

    • datarade.ai
    .json, .csv, .xls
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Outscraper Google Maps Scraper [Dataset]. https://datarade.ai/data-products/outscraper-google-maps-scraper-outscraper
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Dec 9, 2021
    Area covered
    Cameroon, Sint Eustatius and Saba, United States Minor Outlying Islands, Western Sahara, Guyana, Botswana, Egypt, Zimbabwe, Uruguay, Mayotte
    Description

    Are you looking to identify B2B leads to promote your business, product, or service? Outscraper Google Maps Scraper might just be the tool you've been searching for. This powerful software enables you to extract business data directly from Google's extensive database, which spans millions of businesses across countless industries worldwide.

    Outscraper Google Maps Scraper is a tool built with advanced technology that lets you scrape a myriad of valuable information about businesses from Google's database. This information includes but is not limited to, business names, addresses, contact information, website URLs, reviews, ratings, and operational hours.

    Whether you are a small business trying to make a mark or a large enterprise exploring new territories, the data obtained from the Outscraper Google Maps Scraper can be a treasure trove. This tool provides a cost-effective, efficient, and accurate method to generate leads and gather market insights.

    By using Outscraper, you'll gain a significant competitive edge as it allows you to analyze your market and find potential B2B leads with precision. You can use this data to understand your competitors' landscape, discover new markets, or enhance your customer database. The tool offers the flexibility to extract data based on specific parameters like business category or geographic location, helping you to target the most relevant leads for your business.

    In a world that's growing increasingly data-driven, utilizing a tool like Outscraper Google Maps Scraper could be instrumental to your business' success. If you're looking to get ahead in your market and find B2B leads in a more efficient and precise manner, Outscraper is worth considering. It streamlines the data collection process, allowing you to focus on what truly matters – using the data to grow your business.

    https://outscraper.com/google-maps-scraper/

    As a result of the Google Maps scraping, your data file will contain the following details:

    Query Name Site Type Subtypes Category Phone Full Address Borough Street City Postal Code State Us State Country Country Code Latitude Longitude Time Zone Plus Code Rating Reviews Reviews Link Reviews Per Scores Photos Count Photo Street View Working Hours Working Hours Old Format Popular Times Business Status About Range Posts Verified Owner ID Owner Title Owner Link Reservation Links Booking Appointment Link Menu Link Order Links Location Link Place ID Google ID Reviews ID

    If you want to enrich your datasets with social media accounts and many more details you could combine Google Maps Scraper with Domain Contact Scraper.

    Domain Contact Scraper can scrape these details:

    Email Facebook Github Instagram Linkedin Phone Twitter Youtube

  10. NYC Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    NYC Open Data
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

    Content

    Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

    • Over 8 million 311 service requests from 2012-2016

    • More than 1 million motor vehicle collisions 2012-present

    • Citi Bike stations and 30 million Citi Bike trips 2013-present

    • Over 1 billion Yellow and Green Taxi rides from 2009-present

    • Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://opendata.cityofnewyork.us/

    https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

    The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

    Banner Photo by @bicadmedia from Unplash.

    Inspiration

    On which New York City streets are you most likely to find a loud party?

    Can you find the Virginia Pines in New York City?

    Where was the only collision caused by an animal that injured a cyclist?

    What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

    https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png

  11. R

    Google Street View Store (with Rotation) Dataset

    • universe.roboflow.com
    zip
    Updated May 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pigeon (2022). Google Street View Store (with Rotation) Dataset [Dataset]. https://universe.roboflow.com/pigeon/google-street-view-store-dataset--with-rotation/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 24, 2022
    Dataset authored and provided by
    Pigeon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Store Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Retail Analysis and Mapping: Using the "Google Street View Store Dataset (With Rotation)", businesses and researchers can analyze the distribution of different store types, identify areas with a high concentration of specific stores, and visualize the layout of retail landscapes within cities or regions.

    2. Store Accessibility Assessment: City planners and disability advocacy organizations can use the dataset to evaluate the accessibility of stores and shopping areas for individuals with disabilities, considering factors such as store locations, entrances, and nearby parking facilities.

    3. Competitor Analysis and Strategic Planning: Companies can use the dataset to identify the locations of competitors' stores and assess their market presence in specific areas. This can aid in making important strategic decisions, such as targeting under-served areas or launching new stores.

    4. Real Estate Investment and Development: Real estate investors and developers can use the dataset to find promising areas for commercial development, identify potential retail spaces, and make informed investment decisions based on the store distribution in neighborhoods.

    5. Augmented Reality Applications: Developers of AR applications can use the dataset to create AR experiences that provide information about nearby stores, such as store ratings, opening hours, and special offers, to users in real time as they navigate through the streets using their devices.

  12. T

    civil_comments

    • tensorflow.org
    • huggingface.co
    Updated Feb 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). civil_comments [Dataset]. https://www.tensorflow.org/datasets/catalog/civil_comments
    Explore at:
    Dataset updated
    Feb 28, 2023
    Description

    This version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.

    The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.

    The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, publication IDs, timestamps and commenter-generated "civility" labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.

    For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the "parent_text" feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('civil_comments', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  13. d

    Google SERP Data, Web Search Data, Google Images Data | Real-Time API

    • datarade.ai
    .json, .csv
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenWeb Ninja (2024). Google SERP Data, Web Search Data, Google Images Data | Real-Time API [Dataset]. https://datarade.ai/data-products/openweb-ninja-google-data-google-image-data-google-serp-d-openweb-ninja
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    May 17, 2024
    Dataset authored and provided by
    OpenWeb Ninja
    Area covered
    Panama, Ireland, Barbados, South Georgia and the South Sandwich Islands, Burundi, Tokelau, Grenada, Virgin Islands (U.S.), Uruguay, Uganda
    Description

    OpenWeb Ninja's Google Images Data (Google SERP Data) API provides real-time image search capabilities for images sourced from all public sources on the web.

    The API enables you to search and access more than 100 billion images from across the web including advanced filtering capabilities as supported by Google Advanced Image Search. The API provides Google Images Data (Google SERP Data) including details such as image URL, title, size information, thumbnail, source information, and more data points. The API supports advanced filtering and options such as file type, image color, usage rights, creation time, and more. In addition, any Advanced Google Search operators can be used with the API.

    OpenWeb Ninja's Google Images Data & Google SERP Data API common use cases:

    • Creative Media Production: Enhance digital content with a vast array of real-time images, ensuring engaging and brand-aligned visuals for blogs, social media, and advertising.

    • AI Model Enhancement: Train and refine AI models with diverse, annotated images, improving object recognition and image classification accuracy.

    • Trend Analysis: Identify emerging market trends and consumer preferences through real-time visual data, enabling proactive business decisions.

    • Innovative Product Design: Inspire product innovation by exploring current design trends and competitor products, ensuring market-relevant offerings.

    • Advanced Search Optimization: Improve search engines and applications with enriched image datasets, providing users with accurate, relevant, and visually appealing search results.

    OpenWeb Ninja's Annotated Imagery Data & Google SERP Data Stats & Capabilities:

    • 100B+ Images: Access an extensive database of over 100 billion images.

    • Images Data from all Public Sources (Google SERP Data): Benefit from a comprehensive aggregation of image data from various public websites, ensuring a wide range of sources and perspectives.

    • Extensive Search and Filtering Capabilities: Utilize advanced search operators and filters to refine image searches by file type, color, usage rights, creation time, and more, making it easy to find exactly what you need.

    • Rich Data Points: Each image comes with more than 10 data points, including URL, title (annotation), size information, thumbnail, and source information, providing a detailed context for each image.

  14. h

    rampnet-crop-model-dataset

    • huggingface.co
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Project Sidewalk (2025). rampnet-crop-model-dataset [Dataset]. https://huggingface.co/datasets/projectsidewalk/rampnet-crop-model-dataset
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    Project Sidewalk
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    RampNet is a two-stage pipeline that addresses the scarcity of curb ramp detection datasets by using government location data to automatically generate over 210,000 annotated Google Street View panoramas. This new dataset is then used to train a state-of-the-art curb ramp detection model that significantly outperforms previous efforts. In this repo, we provide "the tiny set of manually labeled crops" that we refer to in both RampNet's GitHub repository and the paper. It contains test, train… See the full description on the dataset page: https://huggingface.co/datasets/projectsidewalk/rampnet-crop-model-dataset.

  15. d

    Replication Data for: A Study for Scholarly Impacts of International...

    • dataone.org
    • dataverse.harvard.edu
    • +1more
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Balci, Ali; Filiz Cicioglu; Duygu Kalkan (2023). Replication Data for: A Study for Scholarly Impacts of International Relations Academics and Departments in Turkey through Google Scholar Data [Dataset]. http://doi.org/10.7910/DVN/EZTVWV
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Balci, Ali; Filiz Cicioglu; Duygu Kalkan
    Area covered
    Türkiye
    Description

    Since computers revealed the possibility to collect and evaluate large data, there has been a significant increase in studies measuring the impact of academics. This study aims to analyse International Relations scholars and departments in Turkey by using the data from Google Scholar citation counts. Through this measurement, the study will generate a new ranking list as alternative to existing measurement lists. To control outcomes, Google-generated ranking lists will be compared with data generated from Social Science Citation Index (SSCI). Thus, the study aims to make a data-based contribution to the quality assessment literature, which has become increasingly popular in Turkey. Günümüzde bilgisayarlar geniş verileri toplama ve değerlendirme imkanını ortaya çıkarınca, akademisyenlerin etkisini ölçmeyi hedefleyen çalışmalarda ciddi bir artış oldu. Elinizdeki çalışma da Google Scholar (GS) atıf sayısı verileri üzerinden Türkiye’deki Uluslararası İlişkiler akademisyenlerini ve bölümlerini analiz etmeyi hedeflemektedir. Yapılacak bu analiz ile, mevcut ölçme listelerine alternatif olarak akademisyen ve bölümlerin yeni bir sıralanması ortaya konulmaktadır. GS verilerinden hareketle elde edilen sonuçlar, kontrol amacıyla Social Science Citation Index (SSCI) veri tabanından derlenen makale sayıları ve atıflar ile karşılaştırılmıştır. Böylelikle çalışma Türkiye özelinde gittikçe kapsamlı bir hale gelen nitelik değerlendirme literatürüne verilere dayalı bir katkı yapmayı hedeflemektedir

  16. d

    State of Iowa Google My Business Profile Analytics by Month

    • catalog.data.gov
    • s.cnmilf.com
    • +3more
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.iowa.gov (2024). State of Iowa Google My Business Profile Analytics by Month [Dataset]. https://catalog.data.gov/dataset/state-of-iowa-google-my-business-profile-analytics-by-month
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    data.iowa.gov
    Area covered
    Iowa
    Description

    This dataset provides insights by month on how people find State of Iowa agency listings on the web via Google Search and Maps, and what they do once they find it to include providing reviews (ratings), accessing agency websites, requesting directions, and making calls.

  17. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset provided by
    oxylabs, UAB
    Authors
    Oxylabs
    Area covered
    Nepal, Tunisia, Northern Mariana Islands, Bangladesh, British Indian Ocean Territory, Andorra, Moldova (Republic of), Taiwan, Isle of Man, Canada
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  18. frames-benchmark

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google, frames-benchmark [Dataset]. https://huggingface.co/datasets/google/frames-benchmark
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    FRAMES: Factuality, Retrieval, And reasoning MEasurement Set

    FRAMES is a comprehensive evaluation dataset designed to test the capabilities of Retrieval-Augmented Generation (RAG) systems across factuality, retrieval accuracy, and reasoning. Our paper with details and experiments is available on arXiv: https://arxiv.org/abs/2409.12941.

      Dataset Overview
    

    824 challenging multi-hop questions requiring information from 2-15 Wikipedia articles Questions span diverse topics… See the full description on the dataset page: https://huggingface.co/datasets/google/frames-benchmark.

  19. T

    rlu_atari_checkpoints_ordered

    • tensorflow.org
    Updated Apr 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). rlu_atari_checkpoints_ordered [Dataset]. https://www.tensorflow.org/datasets/catalog/rlu_atari_checkpoints_ordered
    Explore at:
    Dataset updated
    Apr 8, 2022
    Description

    RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established.

    The datasets follow the RLDS format to represent steps and episodes.

    We are releasing a large and diverse dataset of gameplay following the protocol described by Agarwal et al., 2020, which can be used to evaluate several discrete offline RL algorithms. The dataset is generated by running an online DQN agent and recording transitions from its replay during training with sticky actions Machado et al., 2018. As stated in Agarwal et al., 2020, for each game we use data from five runs with 50 million transitions each. We release datasets for 46 Atari games. For details on how the dataset was generated, please refer to the paper. Please see this note about the ROM versions used to generate the datasets.

    Atari is a standard RL benchmark. We recommend you to try offline RL methods on Atari if you are interested in comparing your approach to other state of the art offline RL methods with discrete actions.

    The reward of each step is clipped (obtained with [-1, 1] clipping) and the episode includes the sum of the clipped reward per episode.

    Each of the configurations is broken into splits. Splits correspond to checkpoints of 1M steps (note that the number of episodes may difer). Checkpoints are ordered in time (so checkpoint 0 ran before checkpoint 1).

    Episodes within each split are ordered. Check https://www.tensorflow.org/datasets/determinism if you want to ensure that you read episodes in order.

    This dataset corresponds to the one used in the DQN replay paper. https://research.google/tools/datasets/dqn-replay/

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('rlu_atari_checkpoints_ordered', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  20. SEC Public Dataset

    • console.cloud.google.com
    Updated Jul 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=ko (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=ko
    Explore at:
    Dataset updated
    Jul 27, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.자세히 알아보기

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dataplex (2025). Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights [Dataset]. https://datarade.ai/data-products/dataplex-google-reviews-ratings-dataset-track-consumer-s-dataplex

Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights

Explore at:
.json, .csvAvailable download formats
Dataset updated
Feb 3, 2025
Dataset authored and provided by
Dataplex
Area covered
Grenada, Guinea, Palau, British Indian Ocean Territory, Ethiopia, South Georgia and the South Sandwich Islands, Korea (Democratic People's Republic of), Bhutan, Sweden, French Polynesia
Description

The Google Reviews & Ratings Dataset provides businesses with structured insights into customer sentiment, satisfaction, and trends based on reviews from Google. Unlike broad review datasets, this product is location-specific—businesses provide the locations they want to track, and we retrieve as much historical data as possible, with daily updates moving forward.

This dataset enables businesses to monitor brand reputation, analyze consumer feedback, and enhance decision-making with real-world insights. For deeper analysis, optional AI-driven sentiment analysis and review summaries are available on a weekly, monthly, or yearly basis.

Dataset Highlights

  • Location-Specific Reviews – Reviews and ratings for the locations you provide.
  • Daily Updates – New reviews and rating changes updated automatically.
  • Historical Data Access – Retrieve past reviews where available.
  • AI Sentiment Analysis (Optional) – Summarized insights by week, month, or year.
  • Competitive Benchmarking – Compare performance across selected locations.

Use Cases

  • Franchise & Retail Chains – Monitor brand reputation and performance across locations.
  • Hospitality & Restaurants – Track guest sentiment and service trends.
  • Healthcare & Medical Facilities – Understand patient feedback for specific locations.
  • Real Estate & Property Management – Analyze tenant and customer experiences through reviews.
  • Market Research & Consumer Insights – Identify trends and analyze feedback patterns across industries.

Data Updates & Delivery

  • Update Frequency: Daily
  • Data Format: CSV for easy integration
  • Delivery: Secure file transfer (SFTP or cloud storage)

Data Fields Include:

  • Business Name
  • Location Details
  • Star Ratings
  • Review Text
  • Timestamps
  • Reviewer Metadata

Optional Add-Ons:

  • AI Sentiment Analysis – Aggregate trends by week, month, or year.
  • Custom Location Tracking – Tailor the dataset to fit your specific business needs.

Ideal for

  • Marketing Teams – Leverage real-world consumer feedback to optimize brand strategy.
  • Business Analysts – Use structured review data to track customer sentiment over time.
  • Operations & Customer Experience Teams – Identify service issues and opportunities for improvement.
  • Competitive Intelligence – Compare locations and benchmark against industry competitors.

Why Choose This Dataset?

  • Accurate & Up-to-Date – Daily updates ensure fresh, reliable data.
  • Scalable & Customizable – Track only the locations that matter to you.
  • Actionable Insights – AI-driven summaries for quick decision-making.
  • Easy Integration – Delivered in a structured format for seamless analysis.

By leveraging Google Reviews & Ratings Data, businesses can gain valuable insights into customer sentiment, enhance reputation management, and stay ahead of the competition.

Search
Clear search
Close search
Google apps
Main menu