100+ datasets found

d
Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment &...
datarade.ai
.json, .csv
Updated Feb 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataplex (2025). Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights [Dataset]. https://datarade.ai/data-products/dataplex-google-reviews-ratings-dataset-track-consumer-s-dataplex
Explore at:
.json, .csvAvailable download formats
Dataset updated
Feb 3, 2025
Dataset authored and provided by
Dataplex
Area covered
Grenada, Guinea, Palau, British Indian Ocean Territory, Ethiopia, South Georgia and the South Sandwich Islands, Korea (Democratic People's Republic of), Bhutan, Sweden, French Polynesia
Description
The Google Reviews & Ratings Dataset provides businesses with structured insights into customer sentiment, satisfaction, and trends based on reviews from Google. Unlike broad review datasets, this product is location-specific—businesses provide the locations they want to track, and we retrieve as much historical data as possible, with daily updates moving forward.

This dataset enables businesses to monitor brand reputation, analyze consumer feedback, and enhance decision-making with real-world insights. For deeper analysis, optional AI-driven sentiment analysis and review summaries are available on a weekly, monthly, or yearly basis.

Dataset Highlights

Location-Specific Reviews – Reviews and ratings for the locations you provide.

Daily Updates – New reviews and rating changes updated automatically.

Historical Data Access – Retrieve past reviews where available.

AI Sentiment Analysis (Optional) – Summarized insights by week, month, or year.

Competitive Benchmarking – Compare performance across selected locations.

Use Cases

Franchise & Retail Chains – Monitor brand reputation and performance across locations.

Hospitality & Restaurants – Track guest sentiment and service trends.

Healthcare & Medical Facilities – Understand patient feedback for specific locations.

Real Estate & Property Management – Analyze tenant and customer experiences through reviews.

Market Research & Consumer Insights – Identify trends and analyze feedback patterns across industries.

Data Updates & Delivery

Update Frequency: Daily

Data Format: CSV for easy integration

Delivery: Secure file transfer (SFTP or cloud storage)

Data Fields Include:

Business Name

Location Details

Star Ratings

Review Text

Timestamps

Reviewer Metadata

Optional Add-Ons:

AI Sentiment Analysis – Aggregate trends by week, month, or year.

Custom Location Tracking – Tailor the dataset to fit your specific business needs.

Ideal for

Marketing Teams – Leverage real-world consumer feedback to optimize brand strategy.

Business Analysts – Use structured review data to track customer sentiment over time.

Operations & Customer Experience Teams – Identify service issues and opportunities for improvement.

Competitive Intelligence – Compare locations and benchmark against industry competitors.

Why Choose This Dataset?

Accurate & Up-to-Date – Daily updates ensure fresh, reliable data.

Scalable & Customizable – Track only the locations that matter to you.

Actionable Insights – AI-driven summaries for quick decision-making.

Easy Integration – Delivered in a structured format for seamless analysis.

By leveraging Google Reviews & Ratings Data, businesses can gain valuable insights into customer sentiment, enhance reputation management, and stay ahead of the competition.
AI Financial Market Data
kaggle.com
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Science Lovers (2025). AI Financial Market Data [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/ai-financial-and-market-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data Science Lovers
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.

This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

This analyse will be helpful for those working in Finance or Share Market domain.

From this dataset, we extract various insights using Python in our Project.

1) How much amount the companies spent on R & D ?

2) Revenue Earned by the companies

3) Date-wise Impact on the Stock

4) Events when Maximum Stock Impact was observed

5) AI Revenue Growth of the companies

6) Correlation between the columns

7) Expenditure vs Revenue year-by-year

8) Event Impact Analysis

9) Change in the index wrt Year & Company

These are the main Features/Columns available in the dataset :

1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.

2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".

3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.

4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.

5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.

6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.

7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.
Google Landmarks Dataset v2
github.com
opendatalab.com
Updated Sep 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
Explore at:
Dataset updated
Sep 27, 2019
Dataset provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
COVID-19 Search Trends symptoms dataset
console.cloud.google.com
Updated Jul 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=de&inv=1&invt=Ab4Bvg (2023). COVID-19 Search Trends symptoms dataset [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-search-trends?hl=de
Explore at:
Dataset updated
Jul 8, 2023
Dataset provided by
Google Searchhttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Description
The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for a broad set of health symptoms, signs, and conditions. The dataset provides a daily or weekly time series for each region showing the relative volume of searches for each symptom. This dataset is intended to help researchers to better understand the impact of COVID-19. It shouldn't be used for medical diagnostic, prognostic, or treatment purposes. It also isn't intended to be used for guidance on personal travel plans. To learn more about the dataset, how we generate it and preserve privacy, read the data documentation . To visualize the data, try exploring these interactive charts and map of symptom search trends . As of Dec. 15, 2020, the dataset was expanded to include trends for Australia, Ireland, New Zealand, Singapore, and the United Kingdom. This expanded data is available in new tables that provide data at country and two subregional levels. We will not be updating existing state/county tables going forward. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
A
Data from: Google Earth Engine (GEE)
data.amerigeoss.org
sdgs.amerigeoss.org
+6more
esri rest, html
Updated Nov 28, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AmeriGEO ArcGIS (2018). Google Earth Engine (GEE) [Dataset]. https://data.amerigeoss.org/de/dataset/google-earth-engine-gee2
Explore at:
html, esri restAvailable download formats
Dataset updated
Nov 28, 2018
Dataset provided by
AmeriGEO ArcGIS
Description
Meet Earth Engine
Google Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface.
SATELLITE IMAGERY+YOUR ALGORITHMS+REAL WORLD APPLICATIONS
LEARN MORE
GLOBAL-SCALE INSIGHT
Explore our interactive timelapse viewer to travel back in time and see how the world has changed over the past twenty-nine years. Timelapse is one example of how Earth Engine can help gain insight into petabyte-scale datasets.
EXPLORE TIMELAPSE
READY-TO-USE DATASETS
The public data archive includes more than thirty years of historical imagery and scientific datasets, updated and expanded daily. It contains over twenty petabytes of geospatial data instantly available for analysis.
EXPLORE DATASETS
SIMPLE, YET POWERFUL API
The Earth Engine API is available in Python and JavaScript, making it easy to harness the power of Google’s cloud for your own geospatial analysis.
EXPLORE THE API
Google Earth Engine has made it possible for the first time in history to rapidly and accurately process vast amounts of satellite imagery, identifying where and when tree cover change has occurred at high resolution. Global Forest Watch would not exist without it. For those who care about the future of the planet Google Earth Engine is a great blessing!-Dr. Andrew Steer, President and CEO of the World Resources Institute.
CONVENIENT TOOLS
Use our web-based code editor for fast, interactive algorithm development with instant access to petabytes of data.
LEARN ABOUT THE CODE EDITOR
SCIENTIFIC AND HUMANITARIAN IMPACT
Scientists and non-profits use Earth Engine for remote sensing research, predicting disease outbreaks, natural resource management, and more.
SEE CASE STUDIES
READY TO BE PART OF THE SOLUTION?SIGN UP NOW
TERMS OF SERVICE PRIVACY ABOUT GOOGLE

Google's Audioset: Reformatted

zenodo.org
data.niaid.nih.gov

tsv

Updated Sep 21, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Bakhtin; Bakhtin (2022). Google's Audioset: Reformatted [Dataset]. http://doi.org/10.5281/zenodo.7096702

Explore at:

tsvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7096702

Dataset updated

Sep 21, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Bakhtin; Bakhtin

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Google's AudioSet consistently reformatted

During my work with Google's AudioSet(https://research.google.com/audioset/index.html)
I encountered some problems due to the fact that Weak (https://research.google.com/audioset/download.html) and
 Strong (https://research.google.com/audioset/download_strong.html) versions of the dataset used different csv formatting for the data, and that also labels used in the two datasets are different (https://github.com/audioset/ontology/issues/9) and also presented in files with different formatting.

This dataset reformatting aims to unify the formats of the datasets so that it is possible
to analyse them in the same pipelines, and also make the dataset files compatible
with psds_eval, dcase_util and sed_eval Python packages used in Audio Processing.

For better formatted documentation and source code of reformatting refer to https://github.com/bakhtos/GoogleAudioSetReformatted 

-Changes in dataset

All files are converted to tab-separated `*.tsv` files (i.e. `csv` files with `\t`
as a separator). All files have a header as the first line.

-New fields and filenames

Fields are renamed according to the following table, to be compatible with psds_eval:

Old field -> New field
YTID -> filename
segment_id -> filename
start_seconds -> onset
start_time_seconds -> onset
end_seconds -> offset
end_time_seconds -> offset
positive_labels -> event_label
label -> event_label
present -> present

For class label files, `id` is now the name for the for `mid` label (e.g. `/m/09xor`)
and `label` for the human-readable label (e.g. `Speech`). Index of label indicated
for Weak dataset labels (`index` field in `class_labels_indices.csv`) is not used.

Files are renamed according to the following table to ensure consisted naming
of the form `audioset_[weak|strong]_[train|eval]_[balanced|unbalanced|posneg]*.tsv`:

Old name -> New name
balanced_train_segments.csv -> audioset_weak_train_balanced.tsv
unbalanced_train_segments.csv -> audioset_weak_train_unbalanced.tsv
eval_segments.csv -> audioset_weak_eval.tsv
audioset_train_strong.tsv -> audioset_strong_train.tsv
audioset_eval_strong.tsv -> audioset_strong_eval.tsv
audioset_eval_strong_framed_posneg.tsv -> audioset_strong_eval_posneg.tsv
class_labels_indices.csv -> class_labels.tsv (merged with mid_to_display_name.tsv)
mid_to_display_name.tsv -> class_labels.tsv (merged with class_labels_indices.csv)

-Strong dataset changes

Only changes to the Strong dataset are renaming of fields and reordering of columns,
so that both Weak and Strong version have `filename` and `event_label` as first 
two columns.

-Weak dataset changes

-- Labels are given one per line, instead of comma-separated and quoted list

-- To make sure that `filename` format is the same as in Strong version, the following
format change is made:
The value of the `start_seconds` field is converted to milliseconds and appended to the `filename` with an underscore. Since all files in the dataset are assumed to be 10 seconds long, this unifies the format of `filename` with the Strong version and makes `end_seconds` also redundant.

-Class labels changes

Class labels from both datasets are merged into one file and given in alphabetical order of `id`s. Since same `id`s are present in both datasets, but sometimes with different human-readable labels, labels from Strong dataset overwrite those from Weak. It is possible to regenerate `class_labels.tsv` while giving priority to the Weak version of labels by calling `convert_labels(False)` from convert.py in the GitHub repository.

-License

Google's AudioSet was published in two stages - first the Weakly labelled data (Gemmeke, Jort F., et al. "Audio set: An ontology and human-labeled dataset for audio events." 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017.), then the strongly labelled data (Hershey, Shawn, et al. "The benefit of temporally-strong labels in audio event classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.)

Both the original dataset and this reworked version are licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)

Class labels come from the AudioSet Ontology, which is licensed under CC BY-SA 4.0.

Google Analytics Sample
console.cloud.google.com
Updated Jul 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data
Explore at:
Dataset updated
Jul 15, 2017
Dataset provided by
Googlehttp://google.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
d
Outscraper Google Maps Scraper
datarade.ai
.json, .csv, .xls
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Outscraper Google Maps Scraper [Dataset]. https://datarade.ai/data-products/outscraper-google-maps-scraper-outscraper
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Dec 9, 2021
Area covered
Cameroon, Sint Eustatius and Saba, United States Minor Outlying Islands, Western Sahara, Guyana, Botswana, Egypt, Zimbabwe, Uruguay, Mayotte
Description
Are you looking to identify B2B leads to promote your business, product, or service? Outscraper Google Maps Scraper might just be the tool you've been searching for. This powerful software enables you to extract business data directly from Google's extensive database, which spans millions of businesses across countless industries worldwide.

Outscraper Google Maps Scraper is a tool built with advanced technology that lets you scrape a myriad of valuable information about businesses from Google's database. This information includes but is not limited to, business names, addresses, contact information, website URLs, reviews, ratings, and operational hours.

Whether you are a small business trying to make a mark or a large enterprise exploring new territories, the data obtained from the Outscraper Google Maps Scraper can be a treasure trove. This tool provides a cost-effective, efficient, and accurate method to generate leads and gather market insights.

By using Outscraper, you'll gain a significant competitive edge as it allows you to analyze your market and find potential B2B leads with precision. You can use this data to understand your competitors' landscape, discover new markets, or enhance your customer database. The tool offers the flexibility to extract data based on specific parameters like business category or geographic location, helping you to target the most relevant leads for your business.

In a world that's growing increasingly data-driven, utilizing a tool like Outscraper Google Maps Scraper could be instrumental to your business' success. If you're looking to get ahead in your market and find B2B leads in a more efficient and precise manner, Outscraper is worth considering. It streamlines the data collection process, allowing you to focus on what truly matters – using the data to grow your business.

https://outscraper.com/google-maps-scraper/

As a result of the Google Maps scraping, your data file will contain the following details:

Query Name Site Type Subtypes Category Phone Full Address Borough Street City Postal Code State Us State Country Country Code Latitude Longitude Time Zone Plus Code Rating Reviews Reviews Link Reviews Per Scores Photos Count Photo Street View Working Hours Working Hours Old Format Popular Times Business Status About Range Posts Verified Owner ID Owner Title Owner Link Reservation Links Booking Appointment Link Menu Link Order Links Location Link Place ID Google ID Reviews ID

If you want to enrich your datasets with social media accounts and many more details you could combine Google Maps Scraper with Domain Contact Scraper.

Domain Contact Scraper can scrape these details:

Email Facebook Github Instagram Linkedin Phone Twitter Youtube
NYC Open Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
NYC Open Data
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

Content

Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

Over 8 million 311 service requests from 2012-2016

More than 1 million motor vehicle collisions 2012-present

Citi Bike stations and 30 million Citi Bike trips 2013-present

Over 1 billion Yellow and Green Taxi rides from 2009-present

Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

https://opendata.cityofnewyork.us/

https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

Banner Photo by @bicadmedia from Unplash.

Inspiration

On which New York City streets are you most likely to find a loud party?

Can you find the Virginia Pines in New York City?

Where was the only collision caused by an animal that injured a cyclist?

What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png
R
Google Street View Store (with Rotation) Dataset
universe.roboflow.com
zip
Updated May 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pigeon (2022). Google Street View Store (with Rotation) Dataset [Dataset]. https://universe.roboflow.com/pigeon/google-street-view-store-dataset--with-rotation/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
May 24, 2022
Dataset authored and provided by
Pigeon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Store Bounding Boxes
Description
Here are a few use cases for this project:

Retail Analysis and Mapping: Using the "Google Street View Store Dataset (With Rotation)", businesses and researchers can analyze the distribution of different store types, identify areas with a high concentration of specific stores, and visualize the layout of retail landscapes within cities or regions.

Store Accessibility Assessment: City planners and disability advocacy organizations can use the dataset to evaluate the accessibility of stores and shopping areas for individuals with disabilities, considering factors such as store locations, entrances, and nearby parking facilities.

Competitor Analysis and Strategic Planning: Companies can use the dataset to identify the locations of competitors' stores and assess their market presence in specific areas. This can aid in making important strategic decisions, such as targeting under-served areas or launching new stores.

Real Estate Investment and Development: Real estate investors and developers can use the dataset to find promising areas for commercial development, identify potential retail spaces, and make informed investment decisions based on the store distribution in neighborhoods.

Augmented Reality Applications: Developers of AR applications can use the dataset to create AR experiences that provide information about nearby stores, such as store ratings, opening hours, and special offers, to users in real time as they navigate through the streets using their devices.
T
civil_comments
tensorflow.org
huggingface.co
Updated Feb 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). civil_comments [Dataset]. https://www.tensorflow.org/datasets/catalog/civil_comments
Explore at:
Dataset updated
Feb 28, 2023
Description
This version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.

The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.

The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, publication IDs, timestamps and commenter-generated "civility" labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.

For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the "parent_text" feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('civil_comments', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
d
Google SERP Data, Web Search Data, Google Images Data | Real-Time API
datarade.ai
.json, .csv
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenWeb Ninja (2024). Google SERP Data, Web Search Data, Google Images Data | Real-Time API [Dataset]. https://datarade.ai/data-products/openweb-ninja-google-data-google-image-data-google-serp-d-openweb-ninja
Explore at:
.json, .csvAvailable download formats
Dataset updated
May 17, 2024
Dataset authored and provided by
OpenWeb Ninja
Area covered
Panama, Ireland, Barbados, South Georgia and the South Sandwich Islands, Burundi, Tokelau, Grenada, Virgin Islands (U.S.), Uruguay, Uganda
Description
OpenWeb Ninja's Google Images Data (Google SERP Data) API provides real-time image search capabilities for images sourced from all public sources on the web.

The API enables you to search and access more than 100 billion images from across the web including advanced filtering capabilities as supported by Google Advanced Image Search. The API provides Google Images Data (Google SERP Data) including details such as image URL, title, size information, thumbnail, source information, and more data points. The API supports advanced filtering and options such as file type, image color, usage rights, creation time, and more. In addition, any Advanced Google Search operators can be used with the API.

OpenWeb Ninja's Google Images Data & Google SERP Data API common use cases:

Creative Media Production: Enhance digital content with a vast array of real-time images, ensuring engaging and brand-aligned visuals for blogs, social media, and advertising.

AI Model Enhancement: Train and refine AI models with diverse, annotated images, improving object recognition and image classification accuracy.

Trend Analysis: Identify emerging market trends and consumer preferences through real-time visual data, enabling proactive business decisions.

Innovative Product Design: Inspire product innovation by exploring current design trends and competitor products, ensuring market-relevant offerings.

Advanced Search Optimization: Improve search engines and applications with enriched image datasets, providing users with accurate, relevant, and visually appealing search results.

OpenWeb Ninja's Annotated Imagery Data & Google SERP Data Stats & Capabilities:

100B+ Images: Access an extensive database of over 100 billion images.

Images Data from all Public Sources (Google SERP Data): Benefit from a comprehensive aggregation of image data from various public websites, ensuring a wide range of sources and perspectives.

Extensive Search and Filtering Capabilities: Utilize advanced search operators and filters to refine image searches by file type, color, usage rights, creation time, and more, making it easy to find exactly what you need.

Rich Data Points: Each image comes with more than 10 data points, including URL, title (annotation), size information, thumbnail, and source information, providing a detailed context for each image.
h
rampnet-crop-model-dataset
huggingface.co
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Project Sidewalk (2025). rampnet-crop-model-dataset [Dataset]. https://huggingface.co/datasets/projectsidewalk/rampnet-crop-model-dataset
Explore at:
Dataset updated
Jul 15, 2025
Dataset authored and provided by
Project Sidewalk
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
RampNet is a two-stage pipeline that addresses the scarcity of curb ramp detection datasets by using government location data to automatically generate over 210,000 annotated Google Street View panoramas. This new dataset is then used to train a state-of-the-art curb ramp detection model that significantly outperforms previous efforts. In this repo, we provide "the tiny set of manually labeled crops" that we refer to in both RampNet's GitHub repository and the paper. It contains test, train… See the full description on the dataset page: https://huggingface.co/datasets/projectsidewalk/rampnet-crop-model-dataset.
d
Replication Data for: A Study for Scholarly Impacts of International...
dataone.org
dataverse.harvard.edu
+1more
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Balci, Ali; Filiz Cicioglu; Duygu Kalkan (2023). Replication Data for: A Study for Scholarly Impacts of International Relations Academics and Departments in Turkey through Google Scholar Data [Dataset]. http://doi.org/10.7910/DVN/EZTVWV
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/EZTVWV
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Balci, Ali; Filiz Cicioglu; Duygu Kalkan
Area covered
Türkiye
Description
Since computers revealed the possibility to collect and evaluate large data, there has been a significant increase in studies measuring the impact of academics. This study aims to analyse International Relations scholars and departments in Turkey by using the data from Google Scholar citation counts. Through this measurement, the study will generate a new ranking list as alternative to existing measurement lists. To control outcomes, Google-generated ranking lists will be compared with data generated from Social Science Citation Index (SSCI). Thus, the study aims to make a data-based contribution to the quality assessment literature, which has become increasingly popular in Turkey. Günümüzde bilgisayarlar geniş verileri toplama ve değerlendirme imkanını ortaya çıkarınca, akademisyenlerin etkisini ölçmeyi hedefleyen çalışmalarda ciddi bir artış oldu. Elinizdeki çalışma da Google Scholar (GS) atıf sayısı verileri üzerinden Türkiye’deki Uluslararası İlişkiler akademisyenlerini ve bölümlerini analiz etmeyi hedeflemektedir. Yapılacak bu analiz ile, mevcut ölçme listelerine alternatif olarak akademisyen ve bölümlerin yeni bir sıralanması ortaya konulmaktadır. GS verilerinden hareketle elde edilen sonuçlar, kontrol amacıyla Social Science Citation Index (SSCI) veri tabanından derlenen makale sayıları ve atıflar ile karşılaştırılmıştır. Böylelikle çalışma Türkiye özelinde gittikçe kapsamlı bir hale gelen nitelik değerlendirme literatürüne verilere dayalı bir katkı yapmayı hedeflemektedir
d
State of Iowa Google My Business Profile Analytics by Month
catalog.data.gov
s.cnmilf.com
+3more
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.iowa.gov (2024). State of Iowa Google My Business Profile Analytics by Month [Dataset]. https://catalog.data.gov/dataset/state-of-iowa-google-my-business-profile-analytics-by-month
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
data.iowa.gov
Area covered
Iowa
Description
This dataset provides insights by month on how people find State of Iowa agency listings on the web via Google Search and Maps, and what they do once they find it to include providing reviews (ratings), accessing agency websites, requesting directions, and making calls.
Company Datasets for Business Profiling
datarade.ai
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset provided by
oxylabs, UAB
Authors
Oxylabs
Area covered
Nepal, Tunisia, Northern Mariana Islands, Bangladesh, British Indian Ocean Territory, Andorra, Moldova (Republic of), Taiwan, Isle of Man, Canada
Description
Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

Company name;

Size;

Founding date;

Location;

Industry;

Revenue;

Employee count;

Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

Fresh and accurate data collected and parsed by our expert web scraping team.

Time and resource savings, allowing you to focus on data analysis and achieving your business goals.

A customized approach tailored to your specific business needs.

Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
frames-benchmark
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google, frames-benchmark [Dataset]. https://huggingface.co/datasets/google/frames-benchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Googlehttp://google.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
FRAMES: Factuality, Retrieval, And reasoning MEasurement Set

FRAMES is a comprehensive evaluation dataset designed to test the capabilities of Retrieval-Augmented Generation (RAG) systems across factuality, retrieval accuracy, and reasoning. Our paper with details and experiments is available on arXiv: https://arxiv.org/abs/2409.12941.

Dataset Overview

824 challenging multi-hop questions requiring information from 2-15 Wikipedia articles Questions span diverse topics… See the full description on the dataset page: https://huggingface.co/datasets/google/frames-benchmark.
T
rlu_atari_checkpoints_ordered
tensorflow.org
Updated Apr 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). rlu_atari_checkpoints_ordered [Dataset]. https://www.tensorflow.org/datasets/catalog/rlu_atari_checkpoints_ordered
Explore at:
Dataset updated
Apr 8, 2022
Description
RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established.

The datasets follow the RLDS format to represent steps and episodes.

We are releasing a large and diverse dataset of gameplay following the protocol described by Agarwal et al., 2020, which can be used to evaluate several discrete offline RL algorithms. The dataset is generated by running an online DQN agent and recording transitions from its replay during training with sticky actions Machado et al., 2018. As stated in Agarwal et al., 2020, for each game we use data from five runs with 50 million transitions each. We release datasets for 46 Atari games. For details on how the dataset was generated, please refer to the paper. Please see this note about the ROM versions used to generate the datasets.

Atari is a standard RL benchmark. We recommend you to try offline RL methods on Atari if you are interested in comparing your approach to other state of the art offline RL methods with discrete actions.

The reward of each step is clipped (obtained with [-1, 1] clipping) and the episode includes the sum of the clipped reward per episode.

Each of the configurations is broken into splits. Splits correspond to checkpoints of 1M steps (note that the number of episodes may difer). Checkpoints are ordered in time (so checkpoint 0 ran before checkpoint 1).

Episodes within each split are ordered. Check https://www.tensorflow.org/datasets/determinism if you want to ensure that you read episodes in order.

This dataset corresponds to the one used in the DQN replay paper. https://research.google/tools/datasets/dqn-replay/

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('rlu_atari_checkpoints_ordered', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
SEC Public Dataset
console.cloud.google.com
Updated Jul 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=ko (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=ko
Explore at:
Dataset updated
Jul 27, 2023
Dataset provided by
Googlehttp://google.com/
Description
In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.자세히 알아보기

Facebook

Twitter

Click to copy link

Link copied

Cite

Dataplex (2025). Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights [Dataset]. https://datarade.ai/data-products/dataplex-google-reviews-ratings-dataset-track-consumer-s-dataplex

Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights

Explore at:

.json, .csvAvailable download formats

Dataset updated

Feb 3, 2025

Dataset authored and provided by

Dataplex

Area covered

Grenada, Guinea, Palau, British Indian Ocean Territory, Ethiopia, South Georgia and the South Sandwich Islands, Korea (Democratic People's Republic of), Bhutan, Sweden, French Polynesia

Description

The Google Reviews & Ratings Dataset provides businesses with structured insights into customer sentiment, satisfaction, and trends based on reviews from Google. Unlike broad review datasets, this product is location-specific—businesses provide the locations they want to track, and we retrieve as much historical data as possible, with daily updates moving forward.

This dataset enables businesses to monitor brand reputation, analyze consumer feedback, and enhance decision-making with real-world insights. For deeper analysis, optional AI-driven sentiment analysis and review summaries are available on a weekly, monthly, or yearly basis.

Dataset Highlights

Location-Specific Reviews – Reviews and ratings for the locations you provide.
Daily Updates – New reviews and rating changes updated automatically.
Historical Data Access – Retrieve past reviews where available.
AI Sentiment Analysis (Optional) – Summarized insights by week, month, or year.
Competitive Benchmarking – Compare performance across selected locations.

Use Cases

Franchise & Retail Chains – Monitor brand reputation and performance across locations.
Hospitality & Restaurants – Track guest sentiment and service trends.
Healthcare & Medical Facilities – Understand patient feedback for specific locations.
Real Estate & Property Management – Analyze tenant and customer experiences through reviews.
Market Research & Consumer Insights – Identify trends and analyze feedback patterns across industries.

Data Updates & Delivery

Update Frequency: Daily
Data Format: CSV for easy integration
Delivery: Secure file transfer (SFTP or cloud storage)

Data Fields Include:

Business Name
Location Details
Star Ratings
Review Text
Timestamps
Reviewer Metadata

Optional Add-Ons:

AI Sentiment Analysis – Aggregate trends by week, month, or year.
Custom Location Tracking – Tailor the dataset to fit your specific business needs.

Ideal for

Marketing Teams – Leverage real-world consumer feedback to optimize brand strategy.
Business Analysts – Use structured review data to track customer sentiment over time.
Operations & Customer Experience Teams – Identify service issues and opportunities for improvement.
Competitive Intelligence – Compare locations and benchmark against industry competitors.

Why Choose This Dataset?

Accurate & Up-to-Date – Daily updates ensure fresh, reliable data.
Scalable & Customizable – Track only the locations that matter to you.
Actionable Insights – AI-driven summaries for quick decision-making.
Easy Integration – Delivered in a structured format for seamless analysis.

By leveraging Google Reviews & Ratings Data, businesses can gain valuable insights into customer sentiment, enhance reputation management, and stay ahead of the competition.

Clear search

Close search

Google apps

Main menu

Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment &...

AI Financial Market Data

📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

Google Landmarks Dataset v2

COVID-19 Search Trends symptoms dataset

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Data from: Google Earth Engine (GEE)

Google's Audioset: Reformatted

Google Analytics Sample

Outscraper Google Maps Scraper

NYC Open Data

Context

Content

Acknowledgements

Inspiration

Google Street View Store (with Rotation) Dataset

civil_comments

Google SERP Data, Web Search Data, Google Images Data | Real-Time API

rampnet-crop-model-dataset

Replication Data for: A Study for Scholarly Impacts of International...

State of Iowa Google My Business Profile Analytics by Month

Company Datasets for Business Profiling

frames-benchmark

rlu_atari_checkpoints_ordered

SEC Public Dataset

Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based InsightsSee More Versions

Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights