Between January and July 2024, Google received 61,402 requests for disclosure of user information from the United States federal agencies and courts. This is a slight decrease in comparison to the second half of 2023, in which over 63,000 requests were issued.
In the second half of 2023, Google received more than 216 thousand requests for disclosure of user information from federal agencies and governments worldwide. In the same period, the number of accounts subject to those requests was approximately 441 thousand.
In the second half of 2023, 82 percent of user data requests sent to Google by federal agencies and governments worldwide ended up with the disclosure of some information. In the first half of 2019, this figure stood at 73 percent.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.
The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:
Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.
Fork this kernel to get started.
Banner Photo by Edho Pratama from Unsplash.
What is the total number of transactions generated per device browser in July 2017?
The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?
What was the average number of product pageviews for users who made a purchase in July 2017?
What was the average number of product pageviews for users who did not make a purchase in July 2017?
What was the average total transactions per user that made a purchase in July 2017?
What is the average amount of money spent per session in July 2017?
What is the sequence of pages viewed?
In the first half of 2024, Google received over 82,000 requests for disclosure of user information from the U.S. federal agencies and other government entities. The Indian government ranked second by the number of requests about user information disclosure sent to Google, followed by Germany.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/
Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:
Over 8 million 311 service requests from 2012-2016
More than 1 million motor vehicle collisions 2012-present
Citi Bike stations and 30 million Citi Bike trips 2013-present
Over 1 billion Yellow and Green Taxi rides from 2009-present
Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015
This dataset is deprecated and not being updated.
Fork this kernel to get started with this dataset.
https://opendata.cityofnewyork.us/
This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.
The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.
Banner Photo by @bicadmedia from Unplash.
On which New York City streets are you most likely to find a loud party?
Can you find the Virginia Pines in New York City?
Where was the only collision caused by an animal that injured a cyclist?
What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?
https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here">
https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.
By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.
Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.
The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!
While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.
The files contained here are a subset of the KernelVersions
in Meta Kaggle. The file names match the ids in the KernelVersions
csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.
The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.
The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads
. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays
We love feedback! Let us know in the Discussion tab.
Happy Kaggling!
You can check the fields description in the documentation: current Full database: https://docs.dataforseo.com/v3/databases/google/full/?bash; Historical Full database: https://docs.dataforseo.com/v3/databases/google/history/full/?bash.
Full Google Database is a combination of the Advanced Google SERP Database and Google Keyword Database.
Google SERP Database offers millions of SERPs collected in 67 regions with most of Google’s advanced SERP features, including featured snippets, knowledge graphs, people also ask sections, top stories, and more.
Google Keyword Database encompasses billions of search terms enriched with related Google Ads data: search volume trends, CPC, competition, and more.
This database is available in JSON format only.
You don’t have to download fresh data dumps in JSON – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.
Google Suite is an umbrella Information System by which USAID receives multiple Google services per USAID's subscription contract. Business services include but are not limited to: Business email through Gmail, Video and voice conferencing, Secure team messaging, Shared calendars, Documents, spreadsheets, and presentations, Unlimited cloud storage, and Smart search across G Suite with Cloud Search. Security and administration controls include: Control how long your email messages and on-the-record chats are retained. Specify policies for your entire domain or based on organizational units, date ranges, and specific terms. Archive and set retention policies for emails and chats, Security center for G Suite, eDiscovery for emails, chats, and files, Audit reports to track user activity, Data loss prevention for Gmail, Data loss prevention for Drive Hosted S/MIME for Gmail, Integrate Gmail with compliant third-party archiving tools, Enterprise-grade access control with security key enforcement, and Gmail log analysis in BigQuery
In the first half of 2023, around 34,941 requests to view Google user data were made. The number of inquiries for Google user data from governmental institutions in Germany more than doubled from the first half of 2020 to the first half of 2023.
This dataset was created by Ethan Tyler Rundquist
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
This dataset contains the valuation template the researcher can use to retrieve real-time Excel stock price and stock price in Google Sheets. The dataset is provided by Finsheet, the leading financial data provider for spreadsheet users. To get more financial data, visit the website and explore their function. For instance, if a researcher would like to get the last 30 years of income statement for Meta Platform Inc, the syntax would be =FS_EquityFullFinancials("FB", "ic", "FY", 30) In addition, this syntax will return the latest stock price for Caterpillar Inc right in your spreadsheet. =FS_Latest("CAT") If you need assistance with any of the function, feel free to reach out to their customer support team. To get starter, install their Excel and Google Sheets add-on.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OpenSim is an open-source biomechanical package with a variety of applications. It is available for many users with bindings in MATLAB, Python, and Java via its application programming interfaces (APIs). Although the developers described well the OpenSim installation on different operating systems (Windows, Mac, and Linux), it is time-consuming and complex since each operating system requires a different configuration. This project aims to demystify the development of neuro-musculoskeletal modeling in OpenSim with zero configuration on any operating system for installation (thus cross-platform), easy to share models while accessing free graphical processing units (GPUs) on a web-based platform of Google Colab. To achieve this, OpenColab was developed where OpenSim source code was used to build a Conda package that can be installed on the Google Colab with only one block of code in less than 7 min. To use OpenColab, one requires a connection to the internet and a Gmail account. Moreover, OpenColab accesses vast libraries of machine learning methods available within free Google products, e.g. TensorFlow. Next, we performed an inverse problem in biomechanics and compared OpenColab results with OpenSim graphical user interface (GUI) for validation. The outcomes of OpenColab and GUI matched well (r≥0.82). OpenColab takes advantage of the zero-configuration of cloud-based platforms, accesses GPUs, and enables users to share and reproduce modeling approaches for further validation, innovative online training, and research applications. Step-by-step installation processes and examples are available at: https://simtk.org/projects/opencolab.
What Makes Our Data Unique?
Autoscraping’s Google Places Review Data is a premium resource for organizations seeking in-depth consumer insights from a trusted global platform. What sets our data apart is its sheer volume and quality—spanning over 10 million reviews from Google Places worldwide. Each review includes critical attributes such as ratings, comment titles, comment bodies, and detailed sentiment analysis. This data is meticulously curated to capture the authentic voice of consumers, offering a rich source of information for understanding customer satisfaction, brand perception, and market trends.
Our dataset is unique not only because of its scale but also due to the richness of its metadata. We provide granular details about each review, including the review source, place ID, and post date, allowing for precise temporal and spatial analysis. This level of detail enables users to track changes in consumer sentiment over time, correlate reviews with specific locations, and conduct deep dives into customer feedback across various industries.
Moreover, the dataset is continuously updated to ensure it reflects the most current opinions and trends, making it an invaluable tool for real-time market analysis and competitive intelligence.
How is the Data Generally Sourced?
The data is sourced directly from Google Places, one of the most widely used platforms for business reviews and location-based feedback globally. Our robust web scraping infrastructure is specifically designed to extract every relevant piece of information from Google Places efficiently and accurately. We employ advanced scraping techniques that allow us to capture a wide array of review data across multiple industries and geographic locations.
The scraping process is conducted at regular intervals to ensure that our dataset remains up-to-date with the latest consumer feedback. Each entry undergoes rigorous data validation and cleaning processes to remove duplicates, correct inconsistencies, and enhance data accuracy. This ensures that users receive high-quality, reliable data that can be trusted for critical decision-making.
Primary Use-Cases and Verticals
This Google Places Review Data is a versatile resource with a wide range of applications across various verticals:
Consumer Insights and Market Research: Companies can leverage this data to gain a deeper understanding of consumer opinions and preferences. By analyzing ratings, comments, and sentiment across different locations and industries, businesses can identify emerging trends, discover potential areas for improvement, and better align their products or services with customer needs.
Brand Reputation Management: Organizations can use this data to monitor their brand reputation across multiple locations. The dataset enables users to track customer sentiment over time, identify patterns in feedback, and respond proactively to negative reviews. This helps businesses maintain a positive brand image and enhance customer loyalty.
Competitive Analysis: By analyzing reviews and ratings of competitors, companies can gain valuable insights into their strengths and weaknesses. This data can inform strategic decisions, such as product development, marketing campaigns, and customer engagement strategies.
Location-Based Marketing: Marketers can utilize this data to tailor their campaigns based on regional customer preferences and sentiments. The geolocation aspect of the data allows for precise targeting, ensuring that marketing efforts resonate with local audiences.
Product and Service Improvement: Businesses can use the detailed feedback from Google Places reviews to identify specific areas where their products or services may be falling short. This information can be used to drive improvements and innovations, ultimately enhancing customer satisfaction and business performance.
Real-Time Sentiment Analysis: The continuous update of our dataset makes it ideal for real-time sentiment analysis. Companies can track how customer sentiment evolves in response to new products, services, or market events, allowing them to react quickly and adapt to changing market conditions.
How Does This Data Product Fit into Our Broader Data Offering?
Autoscraping’s Google Places Review Data is a vital component of our comprehensive data offering, which spans various industries and geographies. This dataset complements our broader portfolio of consumer feedback data, which includes reviews from other major platforms, social media sentiment data, and customer satisfaction surveys.
By integrating this Google Places data with other datasets in our portfolio, users can develop a more holistic view of consumer behavior and market dynamics. For example, combining review data with sales data or demographic information can provide deeper insights into how different factors influence customer satisfaction and purchasing decisions.
Our commitment to delivering high-...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project investigates the perceived characteristics of the data collected by the Google Voice Assistant (i.e. speech records). Speech records were collected through data donation, analysed, represented visually, and used during semi-structured interviews to interrogate people's perceptions of sensitivity and intimacy. The dataset includes the analysis and interview protocol, the visual representation of the data, and the thematic structure of the results.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Indonesia Big Data Analytics Software Market Analysis The Indonesia Big Data Analytics Software market is poised to witness substantial growth over the forecast period of 2025-2033, with a CAGR of 9.35%. In 2025, the market stood at a value of USD 43.15 million and is projected to reach a remarkable value by 2033. This growth is primarily driven by the increasing adoption of digital technologies, the proliferation of data-intensive applications, and the growing need for businesses to make data-driven decisions. Key trends shaping the market include the rising popularity of cloud-based big data analytics solutions, the emergence of advanced analytics techniques such as machine learning and artificial intelligence, and the growing awareness of data privacy and security concerns. Despite these positive factors, the market faces challenges such as the lack of skilled professionals in data analytics, the high cost of implementation, and the complexities associated with managing and integrating large volumes of data. Prominent players in the market include Teradata, SAS, SAP, Tableau Software, and IBM Corporation, among others. Market Size and Growth The Indonesia Big Data Analytics Software Market is projected to grow from USD 235.6 million in 2023 to USD 1,159.1 million by 2029, exhibiting a CAGR of 24.3% during the forecast period. This growth can be attributed to the increasing adoption of big data analytics solutions by organizations to enhance their decision-making, improve operational efficiency, and gain a competitive advantage. Recent developments include: June 2024: Indosat Ooredoo Hutchison (Indosat) and Google Cloud expanded their long-term alliance to accelerate Indosat’s transformation from telco to AI Native TechCo. The collaboration will combine Indosat’s vast network, operational, and customer datasets with Google Cloud’s unified AI stack to deliver exceptional experiences to over 100 million Indosat customers and generative AI (GenAI) solutions for businesses across Indonesia. These include geospatial analytics and predictive modeling, real-time conversation analysis, and back-office transformation. Indosat’s early adoption of an AI-ready data analytics platform exemplifies its forward-thinking approach., June 2024: Palo Alto Networks launched a new cloud facility in Indonesia, catering to the rising demand for local data residency compliance. The move empowers organizations in Indonesia with access to Palo Alto Networks' Cortex XDR advanced AI and analytics platform that offers a comprehensive security solution by unifying endpoint, network, and cloud data. With this new infrastructure, Indonesian customers can ensure data residency by housing their logs and analytics within the country.. Key drivers for this market are: Higher Emphasis on the Use of Analytics Tools to Empower Decision Making, Rapid Increase in the Generation of Data Coupled with Availability of Several End User Specific Tools due to the Growth in the Local Landscape. Potential restraints include: Higher Emphasis on the Use of Analytics Tools to Empower Decision Making, Rapid Increase in the Generation of Data Coupled with Availability of Several End User Specific Tools due to the Growth in the Local Landscape. Notable trends are: Small and Medium Enterprises to Hold Major Market Share.
This dataset was created by Elliott Maglio
This dataset provides insights by month on how people find State of Iowa agency listings on the web via Google Search and Maps, and what they do once they find it to include providing reviews (ratings), accessing agency websites, requesting directions, and making calls.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.
Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html
The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.
Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).
In the following, we describe how the search results have been collected.
Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.
To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.
A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.
The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).
Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.
The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.
Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.
The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.
One term of usage applies:
In any research product whose findings are based on this dataset, please cite
@inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4_3}, doi = {10.1007/978-3-030-11226-4_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Between January and July 2024, Google received 61,402 requests for disclosure of user information from the United States federal agencies and courts. This is a slight decrease in comparison to the second half of 2023, in which over 63,000 requests were issued.