100+ datasets found

Google Trends Dataset
kaggle.com
Updated Feb 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dhruvil Dave (2021). Google Trends Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/1936665
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/1936665
Dataset updated
Feb 13, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dhruvil Dave
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This is a curated dataset of Google Trends over the years. Every year, Google releases the trending search queries all over the world in various categories. It has trends from 2001 to 2020.

Image Credits: Unsplash - lukecheeser
Google Trends - International
console.cloud.google.com
Updated Jul 22, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&inv=1&invt=Ab2hhQ (2018). Google Trends - International [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-trends-intl
Explore at:
Dataset updated
Jul 22, 2018
Dataset provided by
Google Searchhttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Description
The International Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data for each country and region across the globe, where data is available. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Global market share of leading desktop search engines 2015-2025
statista.com
ai-chatbox.pro
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global market share of leading desktop search engines 2015-2025 [Dataset]. https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/
Explore at:
Dataset updated
Apr 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2015 - Mar 2025
Area covered
Worldwide
Description
As of March 2025, Google represented 79.1 percent of the global online search engine market on desktop devices. Despite being much ahead of its competitors, this represents the lowest share ever recorded by the search engine in these devices for over two decades. Meanwhile, its long-time competitor Bing accounted for 12.21 percent, as tools like Yahoo and Yandex held shares of over 2.9 percent each. Google and the global search market Ever since the introduction of Google Search in 1997, the company has dominated the search engine market, while the shares of all other tools has been rather lopsided. The majority of Google revenues are generated through advertising. Its parent corporation, Alphabet, was one of the biggest internet companies worldwide as of 2024, with a market capitalization of 2.02 trillion U.S. dollars. The company has also expanded its services to mail, productivity tools, enterprise products, mobile devices, and other ventures. As a result, Google earned one of the highest tech company revenues in 2024 with roughly 348.16 billion U.S. dollars. Search engine usage in different countries Google is the most frequently used search engine worldwide. But in some countries, its alternatives are leading or competing with it to some extent. As of the last quarter of 2023, more than 63 percent of internet users in Russia used Yandex, whereas Google users represented little over 33 percent. Meanwhile, Baidu was the most used search engine in China, despite a strong decrease in the percentage of internet users in the country accessing it. In other countries, like Japan and Mexico, people tend to use Yahoo along with Google. By the end of 2024, nearly half of the respondents in Japan said that they had used Yahoo in the past four weeks. In the same year, over 21 percent of users in Mexico said they used Yahoo.

Wordle Answer Search Trends Dataset (2021–2025)

kaggle.com

Updated Jun 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ankush Kamboj (2025). Wordle Answer Search Trends Dataset (2021–2025) [Dataset]. https://www.kaggle.com/datasets/kambojankush/wordle-answer-search-trends-dataset-20212025/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 26, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ankush Kamboj

License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

This dataset investigates the relationship between Wordle answers and Google search spikes, particularly for uncommon words. It spans from June 21, 2021 to June 24, 2025.

It includes daily data for each Wordle answer, its search trend on that day, and frequency-based commonality indicators.

🔍 Hypothesis

Each Wordle answer causes a spike in search volume on the day it appears — more so if the word is rare.

This dataset supports exploration of:

Wordle Answers
Trends for wordle answers
Correlation between wordle answer rarity and search interest

Columns

Column	Description
`date`	Date of the Wordle puzzle
`word`	Correct 5-letter Wordle answer
`game`	Wordle game number
`wordfreq_commonality`	Normalized frequency score using Python’s `wordfreq` library
`subtlex_commonality`	Normalized frequency score using SUBTLEX-US dataset
`trend_day_global`	Google search interest on the day (global, all categories)
`trend_avg_200_global`	200-day average search interest (global, all categories)
`trend_day_language`	Search interest on Wordle day (Language Resources category)
`trend_avg_200_language`	200-day average search interest (Language Resources category)

Notes: - All trend values are relative (0–100 scale, per Google Trends)

🧮 Methodology

Wordle answers were scraped from wordfinder.yourdictionary.com
Commonality scores were computed using:
- wordfreq Python library
- SUBTLEX-US dataset (subtitle frequency, approximating spoken English)
Trend data was fetched using Google Trends API via pytrends

📊 Analysis

Can find analysis done using this data in the blog post

Google user data requests from federal agencies and governments H1 2024, by...
statista.com
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Google user data requests from federal agencies and governments H1 2024, by country [Dataset]. https://www.statista.com/statistics/273501/global-data-requests-from-google-by-federal-agencies-and-governments/
Explore at:
Dataset updated
Feb 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In the first half of 2024, Google received over 82,000 requests for disclosure of user information from the U.S. federal agencies and other government entities. The Indian government ranked second by the number of requests about user information disclosure sent to Google, followed by Germany.
A
‘Google Trends Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Google Trends Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-google-trends-dataset-540a/latest
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Google Trends Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/dhruvildave/google-trends-dataset on 30 September 2021.

--- Dataset description provided by original source is as follows ---

This is a curated dataset of Google Trends over the years. Every year, Google releases the trending search queries all over the world in various categories. It has trends from 2001 to 2020.

Image Credits: Unsplash - lukecheeser

--- Original source retains full ownership of the source dataset ---
m
Google Trends data on pollen searches 2012-2017
data.mendeley.com
Updated Jul 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jane Hall (2019). Google Trends data on pollen searches 2012-2017 [Dataset]. http://doi.org/10.17632/xpy7jykfzw.1
Explore at:
Unique identifier
https://doi.org/10.17632/xpy7jykfzw.1
Dataset updated
Jul 25, 2019
Authors
Jane Hall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Google Trends data on searches for "pollen" for DMA regions near National Allergy Bureau pollen counting stations from 2012-2017, downloaded in 10x replicates, from Jan-Jun and Apr-Dec of each year. Search data for the term "ragweed" is included as a comparator in pollen searches (no file suffix), and can also be found as a separate search term (in files with the suffix "ragweed.csv")
Google Trends Keyword Data by Region and Year: Ukraine
figshare.com
zip
Updated Sep 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kate Townsend (2022). Google Trends Keyword Data by Region and Year: Ukraine [Dataset]. http://doi.org/10.6084/m9.figshare.20949097.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20949097.v1
Dataset updated
Sep 5, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kate Townsend
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ukraine
Description
Keywords: Border Embassy Disconnected Cyber security Data protection Malware Hack Cyber attack VPN

Time period covered: January 1 2022 to March 31st 2022
Google energy consumption 2011-2023
statista.com
ai-chatbox.pro
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Google energy consumption 2011-2023 [Dataset]. https://www.statista.com/statistics/788540/energy-consumption-of-google/
Explore at:
Dataset updated
Oct 11, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Google’s energy consumption has increased over the last few years, reaching 25.9 terawatt hours in 2023, up from 12.8 terawatt hours in 2019. The company has made efforts to make its data centers more efficient through customized high-performance servers, using smart temperature and lighting, advanced cooling techniques, and machine learning. Datacenters and energy Through its operations, Google pursues a more sustainable impact on the environment by creating efficient data centers that use less energy than the average, transitioning towards renewable energy, creating sustainable workplaces, and providing its users with the technological means towards a cleaner future for the future generations. Through its efficient data centers, Google has also managed to divert waste from its operations away from landfills. Reducing Google’s carbon footprint Google’s clean energy efforts is also related to their efforts to reduce their carbon footprint. Since their commitment to using 100 percent renewable energy, the company has met their targets largely through solar and wind energy power purchase agreements and buying renewable power from utilities. Google is one of the largest corporate purchasers of renewable energy in the world.
Reasons for switching search engines in the U.S. 2019
statista.com
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Reasons for switching search engines in the U.S. 2019 [Dataset]. https://www.statista.com/statistics/1218794/reasons-for-switching-search-engines-us/
Explore at:
Dataset updated
Dec 5, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2019
Area covered
United States
Description
Based on a survey conducted in 2019 among internet users in the United States, the majority of adults (36 percent) admitted they would switch search engines if it meant getting better quality results. Furthermore, 33.7 percent stated that knowing their data was not being collected by a platform would also encourage them to make the switch. Other factors listed included 'having fewer ads' and a well designed interface. Overall, there was a noticeable lean toward search result quality and data privacy when it came to search engine selection.

Google leads despite user preference for increased privacy

Despite a strong consumer call for data protection, Google topped the list when it came to search engines with 93 percent of Americans surveyed reporting to having used the popular search giant at some point during the past 4 weeks. In comparison, the second most popular platform Yahoo! had only been used by 31 percent of those surveyed. Meanwhile DuckDuckGo, the search engine most known for protecting user data and search history had only been used by 8 percent. Mobile search figures lean even more in Google's favor. Here, a similar share (93 percent) of the market as of January 2021 belonged to Google, while approximately 3 percent was held by DuckDuckGo.

Growth expected for search advertising

With search engines playing a significant role in internet use be it on desktop or mobile, companies and search platforms alike are seeing an increased opportunity in the field of search engine advertising. Nationwide spend in the industry reached an impressive 58.2 billion U.S. dollars in 2020, and was forecast to further rise to 66.2 billion within the following year.
Z
Transparency in Keyword Faceted Search: a dataset of Google Shopping html...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoang Van Tien (2020). Transparency in Keyword Faceted Search: a dataset of Google Shopping html pages [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1491556
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
De Nicola Rocco
Hoang Van Tien
Cozza Vittoria
Petrocchi Marinella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.

Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html

The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.

Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).

In the following, we describe how the search results have been collected.

Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.

To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.

A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.

The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).

Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.

The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.

Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.

The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.

One term of usage applies:

In any research product whose findings are based on this dataset, please cite

@inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4_3}, doi = {10.1007/978-3-030-11226-4_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
S
Sri Lanka Google Search Trends: Computer & Electronics: Samsung Electronics
ceicdata.com
Updated Nov 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2022). Sri Lanka Google Search Trends: Computer & Electronics: Samsung Electronics [Dataset]. https://www.ceicdata.com/en/sri-lanka/google-search-trends-by-categories
Explore at:
Dataset updated
Nov 21, 2022
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 8, 2025 - Mar 19, 2025
Area covered
Sri Lanka
Description
Google Search Trends: Computer & Electronics: Samsung Electronics data was reported at 59.000 Score in 15 May 2025. This records a decrease from the previous number of 67.000 Score for 14 May 2025. Google Search Trends: Computer & Electronics: Samsung Electronics data is updated daily, averaging 55.500 Score from Dec 2021 (Median) to 15 May 2025, with 1262 observations. The data reached an all-time high of 100.000 Score in 23 Dec 2023 and a record low of 0.000 Score in 02 Jul 2023. Google Search Trends: Computer & Electronics: Samsung Electronics data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s Sri Lanka – Table LK.Google.GT: Google Search Trends: by Categories.
U
Uzbekistan Google Search Trends: Travel & Accommodations: Airbnb
ceicdata.com
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). Uzbekistan Google Search Trends: Travel & Accommodations: Airbnb [Dataset]. https://www.ceicdata.com/en/uzbekistan/google-search-trends-by-categories/google-search-trends-travel--accommodations-airbnb
Explore at:
Dataset updated
Mar 19, 2025
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 8, 2025 - Mar 19, 2025
Area covered
Uzbekistan
Description
Uzbekistan Google Search Trends: Travel & Accommodations: Airbnb data was reported at 3.000 Score in 14 May 2025. This stayed constant from the previous number of 3.000 Score for 13 May 2025. Uzbekistan Google Search Trends: Travel & Accommodations: Airbnb data is updated daily, averaging 0.000 Score from Dec 2021 (Median) to 14 May 2025, with 1261 observations. The data reached an all-time high of 23.000 Score in 22 Jul 2023 and a record low of 0.000 Score in 02 May 2025. Uzbekistan Google Search Trends: Travel & Accommodations: Airbnb data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s Uzbekistan – Table UZ.Google.GT: Google Search Trends: by Categories.
Google Landmarks Dataset v2
github.com
paperswithcode.com
+1more
Updated Sep 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
Explore at:
Dataset updated
Sep 27, 2019
Dataset provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
Leading Google search queries in India 2024
statista.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading Google search queries in India 2024 [Dataset]. https://www.statista.com/statistics/1108790/india-most-searched-keywords-google/
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
India
Description
According to Google's search data from 2024, the most common queries by Indians were ****** and ********. These common search queries provide an insight into the Indian content consumption patterns. Video content was sought actively, also evident by the fact that the video-sharing platform had the highest share of monthly social network users in the country that same year. Optimization of video streaming Not surprisingly, video content is witnessing exponential growth in recent years. It is evident in the fact that video streaming accounts for a major share of online mobile traffic across the nation. Recent trends suggest an increase in consumption of video over graphic or text content. Hence, a sound implementation of SEO in videos has become a necessity for a successful content creating channel. One of the major optimization strategies is to cater to the demographic of the nation, which incorporates efficient description, headline, and tag implementation. Keyword search trends Searches related to local preferences are gaining momentum, rendering local SEO invaluable to promoting visibility of the content. Phrases like “near me” and “close to me” have witnessed a significant increase in their frequency of appearances in queries. Since the coronavirus (COVID-19) outbreak, the latter part of 2020 has seen a significant rise in the usage of queries related to the pandemic. This is testament to the influence of recent events on keywords and optimized phrases for improved channel visibility.
n
Repository Analytics and Metrics Portal (RAMP) 2020 data
data.niaid.nih.gov
datadryad.org
zip
Updated Jul 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2020 data [Dataset]. http://doi.org/10.5061/dryad.dv41ns1z4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.dv41ns1z4
Dataset updated
Jul 23, 2021
Dataset provided by
University of New Mexico
Montana State University
Authors
Jonathan Wheeler; Kenning Arlitsch
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Version update: The originally uploaded versions of the CSV files in this dataset included an extra column, "Unnamed: 0," which is not RAMP data and was an artifact of the process used to export the data to CSV format. This column has been removed from the revised dataset. The data are otherwise the same as in the first version.

The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2020. For a description of the data collection, processing, and output methods, please see the "methods" section below.

Methods Data Collection

RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

country: The country from which the corresponding search originated. device: The device used for the search. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

For any specified date range, the steps to calculate CCD are:

Filter data to only include rows where "citableContent" is set to "Yes." Sum the value of the "clicks" field on these rows.

Output to CSV

Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.

As a result, two CSV datasets are provided for each month of published data:

page-clicks:

The data in these CSV files correspond to the page-level data, and include the following fields:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search. citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No. index: The Elasticsearch index corresponding to page click data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data end with “page-clicks”. For example, the file named 2020-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2020.

country-device-info:

The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:

country: The country from which the corresponding search originated. device: The device used for the search. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search. index: The Elasticsearch index corresponding to country and device access information data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data end with “country-device-info”. For example, the file named 2020-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2020.

References

Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.
A
‘How Every NFL Team’s Fans Lean Politically?’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘How Every NFL Team’s Fans Lean Politically?’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-how-every-nfl-teams-fans-lean-politically-550a/f911ccf2/?iid=003-030&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘How Every NFL Team’s Fans Lean Politically?’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/nfl-fandome on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Data behind the story How Every NFL Team’s Fans Lean Politically.

Google Trends Data

Google Trends data was derived from comparing 5-year search traffic for the 7 sports leagues we analyzed:

https://g.co/trends/5P8aa

Results are listed by designated market area (DMA).

The percentages are the approximate percentage of major-sports searches that were conducted for each league.

Trump's percentage is his share of the vote within the DMA in the 2016 presidential election.

SurveyMonkey Data

SurveyMonkey data was derived from a poll of American adults ages 18 and older, conducted between Sept. 1-7, 2017.

Listed numbers are the raw totals for respondents who ranked a given NFL team among their three favorites, and how many identified with a given party (further broken down by race). We also list the percentages of the entire sample that identified with each party, and were of each race.

The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know.

Source: https://github.com/fivethirtyeight/data

This dataset was created by FiveThirtyEight and contains around 0 samples along with Unnamed: 10, Unnamed: 4, technical information and other features such as: - Unnamed: 3 - Unnamed: 1 - and more.

How to use this dataset

Analyze Unnamed: 13 in relation to Unnamed: 21

Study the influence of Unnamed: 7 on Unnamed: 12

More datasets

Acknowledgements

If you use this dataset in your research, please credit FiveThirtyEight

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
n
Data from: Repository Analytics and Metrics Portal (RAMP) 2021 data
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Wheeler; Kenning Arlitsch (2023). Repository Analytics and Metrics Portal (RAMP) 2021 data [Dataset]. http://doi.org/10.5061/dryad.1rn8pk0tz
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.1rn8pk0tz
Dataset updated
May 23, 2023
Dataset provided by
University of New Mexico
Montana State University
Authors
Jonathan Wheeler; Kenning Arlitsch
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2021. For a description of the data collection, processing, and output methods, please see the "methods" section below.

The record will be revised periodically to make new data available through the remainder of 2021.

Methods

Data Collection

RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

country: The country from which the corresponding search originated. device: The device used for the search. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

For any specified date range, the steps to calculate CCD are:

Filter data to only include rows where "citableContent" is set to "Yes." Sum the value of the "clicks" field on these rows.

Output to CSV

Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.

As a result, two CSV datasets are provided for each month of published data:

page-clicks:

The data in these CSV files correspond to the page-level data, and include the following fields:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search. citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No. index: The Elasticsearch index corresponding to page click data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data end with “page-clicks”. For example, the file named 2021-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2021.

country-device-info:

The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:

country: The country from which the corresponding search originated. device: The device used for the search. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search. index: The Elasticsearch index corresponding to country and device access information data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data end with “country-device-info”. For example, the file named 2021-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2021.

References

Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.
i
Evolution of Web search engine interfaces through SERP screenshots and HTML...
rdm.inesctec.pt
Updated Jul 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Evolution of Web search engine interfaces through SERP screenshots and HTML complete pages for 20 years - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2021-003
Explore at:
Dataset updated
Jul 26, 2021
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset was extracted for a study on the evolution of Web search engine interfaces since their appearance. The well-known list of “10 blue links” has evolved into richer interfaces, often personalized to the search query, the user, and other aspects. We used the most searched queries by year to extract a representative sample of SERP from the Internet Archive. The Internet Archive has been keeping snapshots and the respective HTML version of webpages over time and tts collection contains more than 50 billion webpages. We used Python and Selenium Webdriver, for browser automation, to visit each capture online, check if the capture is valid, save the HTML version, and generate a full screenshot. The dataset contains all the extracted captures. Each capture is represented by a screenshot, an HTML file, and a files' folder. We concatenate the initial of the search engine (G) with the capture's timestamp for file naming. The filename ends with a sequential integer "-N" if the timestamp is repeated. For example, "G20070330145203-1" identifies a second capture from Google by March 30, 2007. The first is identified by "G20070330145203". Using this dataset, we analyzed how SERP evolved in terms of content, layout, design (e.g., color scheme, text styling, graphics), navigation, and file size. We have registered the appearance of SERP features and analyzed the design patterns involved in each SERP component. We found that the number of elements in SERP has been rising over the years, demanding a more extensive interface area and larger files. This systematic analysis portrays evolution trends in search engine user interfaces and, more generally, web design. We expect this work will trigger other, more specific studies that can take advantage of the dataset we provide here. This graphic represents the diversity of captures by year and search engine (Google and Bing).
U.S.: google search year-over-year growth by car brands 2021-2023
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). U.S.: google search year-over-year growth by car brands 2021-2023 [Dataset]. https://www.statista.com/statistics/1398313/us-google-search-yoy-growth-selected-car-brands/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2020 - Apr 2023
Area covered
United States
Description
According to data collected by Pi Datametrics, Google searches for Toyota in the United States increased by **** percent year-over year between the year ending in April 2022 and the year ending in April 2023. Between May 2022 and April 2023, Toyota was the most searched for car brand on U.S. Google.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dhruvil Dave (2021). Google Trends Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/1936665

Google Trends Dataset

A dataset of all the Google Trends all over the world

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/1936665

Dataset updated

Feb 13, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Dhruvil Dave

License

Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically

Description

This is a curated dataset of Google Trends over the years. Every year, Google releases the trending search queries all over the world in various categories. It has trends from 2001 to 2020.

Image Credits: Unsplash - lukecheeser

Clear search

Close search

Google apps

Main menu

Google Trends Dataset

Google Trends - International

Global market share of leading desktop search engines 2015-2025

Wordle Answer Search Trends Dataset (2021–2025)

🔍 Hypothesis

Columns

🧮 Methodology

📊 Analysis

Google user data requests from federal agencies and governments H1 2024, by...

‘Google Trends Dataset’ analyzed by Analyst-2

Google Trends data on pollen searches 2012-2017

Google Trends Keyword Data by Region and Year: Ukraine

Google energy consumption 2011-2023

Reasons for switching search engines in the U.S. 2019

Transparency in Keyword Faceted Search: a dataset of Google Shopping html...

Sri Lanka Google Search Trends: Computer & Electronics: Samsung Electronics

Uzbekistan Google Search Trends: Travel & Accommodations: Airbnb

Google Landmarks Dataset v2

Leading Google search queries in India 2024

Repository Analytics and Metrics Portal (RAMP) 2020 data

‘How Every NFL Team’s Fans Lean Politically?’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

Data from: Repository Analytics and Metrics Portal (RAMP) 2021 data

Evolution of Web search engine interfaces through SERP screenshots and HTML...

U.S.: google search year-over-year growth by car brands 2021-2023

Google Trends Dataset

A dataset of all the Google Trends all over the world