Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
The Google Maps dataset is ideal for getting extensive information on businesses anywhere in the world. Easily filter by location, business type, and other factors to get the exact data you need. The Google Maps dataset includes all major data points: timestamp, name, category, address, description, open website, phone number, open_hours, open_hours_updated, reviews_count, rating, main_image, reviews, url, lat, lon, place_id, country, and more.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
Facebook
TwitterYou can check the fields description in the documentation: current Keyword database: https://docs.dataforseo.com/v3/databases/google/keywords/?bash; Historical Keyword database: https://docs.dataforseo.com/v3/databases/google/history/keywords/?bash. You don’t have to download fresh data dumps in JSON or CSV – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.
Facebook
Twitterhttps://github.com/microsoft/Computational-Use-of-Data-Agreementhttps://github.com/microsoft/Computational-Use-of-Data-Agreement
A dataset of 1.56 million synthetic images of objects in 3D scenes. The dataset was created by researchers at Google AI and is used for research in machine learning and computer vision tasks such as object detection, segmentation, and 3D reconstruction.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Welcome to the Google Places Comprehensive Business Dataset! This dataset has been meticulously scraped from Google Maps and presents extensive information about businesses across several countries. Each entry in the dataset provides detailed insights into business operations, location specifics, customer interactions, and much more, making it an invaluable resource for data analysts and scientists looking to explore business trends, geographic data analysis, or consumer behaviour patterns.
This dataset is ideal for a variety of analytical projects, including: - Market Analysis: Understand business distribution and popularity across different regions. - Customer Sentiment Analysis: Explore relationships between customer ratings and business characteristics. - Temporal Trend Analysis: Analyze patterns of business activity throughout the week. - Geospatial Analysis: Integrate with mapping software to visualise business distribution or cluster businesses based on location.
The dataset contains 46 columns, providing a thorough profile for each listed business. Key columns include:
business_id: A unique Google Places identifier for each business, ensuring distinct entries.phone_number: The contact number associated with the business. It provides a direct means of communication.name: The official name of the business as listed on Google Maps.full_address: The complete postal address of the business, including locality and geographic details.latitude: The geographic latitude coordinate of the business location, useful for mapping and spatial analysis.longitude: The geographic longitude coordinate of the business location.review_count: The total number of reviews the business has received on Google Maps.rating: The average user rating out of 5 for the business, reflecting customer satisfaction.timezone: The world timezone the business is located in, important for temporal analysis.website: The official website URL of the business, providing further information and contact options.category: The category or type of service the business provides, such as restaurant, museum, etc.claim_status: Indicates whether the business listing has been claimed by the owner on Google Maps.plus_code: A sho...
Facebook
TwitterThis is a GPS dataset acquired from Google.
Google tracks the user’s device location through Google Maps, which also works on Android devices, the iPhone, and the web. It’s possible to see the Timeline from the user’s settings in the Google Maps app on Android or directly from the Google Timeline Website. It has detailed information such as when an individual is walking, driving, and flying. Such functionality of tracking can be enabled or disabled on demand by the user directly from the smartphone or via the website. Google has a Take Out service where the users can download all their data or select from the Google products they use the data they want to download. The dataset contains 120,847 instances from a period of 9 months or 253 unique days from February 2019 to October 2019 from a single user. The dataset comprises a pair of (latitude, and longitude), and a timestamp. All the data was delivered in a single CSV file. As the locations of this dataset are well known by the researchers, this dataset will be used as ground truth in many mobility studies.
Please cite the following papers in order to use the datasets:
T. Andrade, B. Cancela, and J. Gama, "Discovering locations and habits from human mobility data," Annals of Telecommunications, vol. 75, no. 9, pp. 505–521, 2020. 10.1007/s12243-020-00807-x (DOI)and T. Andrade, B. Cancela, and J. Gama, "From mobility data to habits and common pathways," Expert Systems, vol. 37, no. 6, p. e12627, 2020.10.1111/exsy.12627 (DOI)
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Dataset Card for Boolq
Dataset Summary
BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks.
Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/google/boolq.
Facebook
TwitterThe International Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data for each country and region across the globe, where data is available. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for IFEval
Dataset Summary
This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run: from datasets import load_dataset
ifeval = load_dataset("google/IFEval")
Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/google/IFEval.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unlock valuable insights with the Google Play Store Android Apps Dataset in CSV format, featuring detailed information on over thousands of Android apps available on the Google Play Store. This comprehensive dataset includes key attributes such as App Name, App Logo, Category, Description, Average Rating, Ratings Count, In-app Purchases, Operating System, Company, Content Rating, Images, Email, Additional Information, and more.
Perfect for market researchers, data scientists, app developers, and analysts, this dataset allows for deep analysis of app performance, user preferences, and industry trends. With data on app descriptions, content ratings, in-app purchases, and company information, you can track trends in the mobile app market, evaluate user satisfaction, and conduct competitive analysis.
The dataset is ideal for businesses looking to optimize app strategies, enhance user experience, and improve app performance based on real user feedback. Easily import the data into your favorite analysis tools to gain actionable insights for your app development or research.
With regularly updated data scraped directly from the Google Play Store, the Google Play Store Android Apps Dataset is an invaluable resource for anyone looking to explore trends, track performance, or enhance their app strategies.
Facebook
TwitterDataset Card for "wiki40b"
Dataset Summary
Clean-up text for 40+ Wikipedia languages editions of pages correspond to entities. The datasets have train/dev/test splits per language. The dataset is cleaned up by page filtering to remove disambiguation pages, redirect pages, deleted pages, and non-entity pages. Each example contains the wikidata id of the entity, and the full Wikipedia article after page processing that removes non-content sections and structured objects.… See the full description on the dataset page: https://huggingface.co/datasets/google/wiki40b.
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Dataset Card for "civil_comments"
Dataset Summary
The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data… See the full description on the dataset page: https://huggingface.co/datasets/google/civil_comments.
Facebook
TwitterDon't forget to upvote, comment, and follow if you are using this dataset. If you have any questions about the dataset I uploaded, feel free to leave them in the comments. Thank you! :)
Jangan lupa untuk upvote, comment, follow jika anda menggunakan dataset ini, dan jika ada pertanyaan mengenai dataset yang saya upload, silahkan tinggalkan di comment. Terima kasih :)
Column Descriptions (English) 1. reviewId: A unique ID for each user review. 2. userName: The name of the user who submitted the review. 3. userImage: The URL of the user's profile picture. 4. content: The text content of the review provided by the user. 5. score: The review score given by the user, typically on a scale of 1-5. 6. thumbsUpCount: The number of likes (thumbs up) received by the review. 7. reviewCreatedVersion: The app version used by the user when creating the review (not always available). 8. at: The date and time when the review was submitted. 9. replyContent: The developer's response to the review (no data available in this column). 10. repliedAt: The date and time when the developer's response was submitted (no data available in this column). 11. appVersion: The app version used by the user when submitting the review (not always available).
Deskripsi Kolom (Bahasa Indonesia) 1. reviewId: ID unik untuk setiap ulasan yang diberikan pengguna. 2. userName: Nama pengguna yang memberikan ulasan. 3. userImage: URL gambar profil pengguna yang memberikan ulasan. 4. content: Isi teks ulasan yang diberikan oleh pengguna. 5. score: Skor ulasan yang diberikan pengguna, biasanya dalam skala 1-5. 6. thumbsUpCount: Jumlah suka (thumbs up) yang diterima oleh ulasan tersebut. 7. reviewCreatedVersion: Versi aplikasi yang digunakan pengguna saat membuat ulasan (tidak selalu tersedia). 8. at: Tanggal dan waktu saat ulasan dibuat. 9. replyContent: Isi balasan dari pengembang aplikasi terhadap ulasan (tidak ada data dalam kolom ini). 10. repliedAt: Tanggal dan waktu saat balasan dari pengembang diberikan (tidak ada data dalam kolom ini). 11. appVersion: Versi aplikasi yang digunakan pengguna saat memberikan ulasan (tidak selalu tersedia).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data that is collected at the individual-level from mobile phones is typically aggregated to the population-level for privacy reasons. If we are interested in answering questions regarding the mean, or working with groups appropriately modeled by a continuum, then this data is immediately informative. However, coupling such data regarding a population to a model that requires information at the individual-level raises a number of complexities. This is the case if we aim to characterize human mobility and simulate the spatial and geographical spread of a disease by dealing in discrete, absolute numbers. In this work, we highlight the hurdles faced and outline how they can be overcome to effectively leverage the specific dataset: Google COVID-19 Aggregated Mobility Research Dataset (GAMRD). Using a case study of Western Australia, which has many sparsely populated regions with incomplete data, we firstly demonstrate how to overcome these challenges to approximate absolute flow of people around a transport network from the aggregated data. Overlaying this evolving mobility network with a compartmental model for disease that incorporated vaccination status we run simulations and draw meaningful conclusions about the spread of COVID-19 throughout the state without de-anonymizing the data. We can see that towns in the Pilbara region are highly vulnerable to an outbreak originating in Perth. Further, we show that regional restrictions on travel are not enough to stop the spread of the virus from reaching regional Western Australia. The methods explained in this paper can be therefore used to analyze disease outbreaks in similarly sparse populations. We demonstrate that using this data appropriately can be used to inform public health policies and have an impact in pandemic responses.
Facebook
TwitterIn the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Learn more
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SMOL
SMOL (Set for Maximal Overall Leverage) is a collection professional translations into 221 Low-Resource Languages, for the purpose of training translation models, and otherwise increasing the representations of said languages in NLP and technology. Please read the SMOL Paper and the GATITOS Paper for a much more thorough description! There are four resources in this directory:
SmolDoc: document-level translations into 106 language pairs (105 unique languages) SmolSent:… See the full description on the dataset page: https://huggingface.co/datasets/google/smol.
Facebook
TwitterDataset Card for Dataset Name
Dataset Summary
This dataset is a subset of Kaggle's Google Landmark Recognition 2021 competition with only the categories with more than 500 images. https://www.kaggle.com/competitions/landmark-recognition-2021/data The dataset consists of a total of 45579 224x224 color images in 51 categories.
Languages
English
Dataset Structure
Data Fields
landmark_id: Int - Numeric identifier of the category category :… See the full description on the dataset page: https://huggingface.co/datasets/pemujo/GLDv2_Top_51_Categories.
Facebook
TwitterThis dataset contains current and historical demographic data on Google's workforce since the company began publishing diversity data in 2014. It includes data collected for government reporting and voluntary employee self-identification globally relating to hiring, retention, and representation categorized by race, gender, sexual orientation, gender identity, disability status, and military status. In some instances, the data is limited due to various government policies around the world and the desire to protect Googler confidentiality. All data in this dataset will be updated yearly upon publication of Google’s Diversity Annual Report . Google uses this data to inform its diversity, equity, and inclusion work. More information on our methodology can be found in the Diversity Annual Report. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Management • Create and edit fusion tables • Upload imagery, vector, and tabular data using Fusion Tables and KMLs • Share data with other Google Earth Engine (GEE) users as well as download imagery after manipulation in GEE.
Facebook
TwitterMultiversX is a highly scalable, secure and decentralized blockchain network created to enable radically new applications, for users, businesses, society, and the new metaverse frontier. This dataset is one of many crypto datasets that are available within Google Cloud Public Datasets . As with other Google Cloud public datasets, you can query this dataset for free, up to 1TB/month of free processing, every month. Watch this short video to learn how to get started with the public datasets.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
The Google Maps dataset is ideal for getting extensive information on businesses anywhere in the world. Easily filter by location, business type, and other factors to get the exact data you need. The Google Maps dataset includes all major data points: timestamp, name, category, address, description, open website, phone number, open_hours, open_hours_updated, reviews_count, rating, main_image, reviews, url, lat, lon, place_id, country, and more.