The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for a broad set of health symptoms, signs, and conditions. The dataset provides a daily or weekly time series for each region showing the relative volume of searches for each symptom. This dataset is intended to help researchers to better understand the impact of COVID-19. It shouldn't be used for medical diagnostic, prognostic, or treatment purposes. It also isn't intended to be used for guidance on personal travel plans. To learn more about the dataset, how we generate it and preserve privacy, read the data documentation . To visualize the data, try exploring these interactive charts and map of symptom search trends . As of Dec. 15, 2020, the dataset was expanded to include trends for Australia, Ireland, New Zealand, Singapore, and the United Kingdom. This expanded data is available in new tables that provide data at country and two subregional levels. We will not be updating existing state/county tables going forward. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies. More information on the data repository is available here . For additional reporting and data visualizations, see The New York Times’ U.S. coronavirus interactive site
Which US counties have the most confirmed cases per capita? This query determines which counties have the most cases per 100,000 residents. Note that this may differ from similar queries of other datasets because of differences in reporting lag, methodologies, or other dataset differences.
SELECT
covid19.county,
covid19.state_name,
total_pop AS county_population,
confirmed_cases,
ROUND(confirmed_cases/total_pop *100000,2) AS confirmed_cases_per_100000,
deaths,
ROUND(deaths/total_pop *100000,2) AS deaths_per_100000
FROM
bigquery-public-data.covid19_nyt.us_counties
covid19
JOIN
bigquery-public-data.census_bureau_acs.county_2017_5yr
acs ON covid19.county_fips_code = acs.geo_id
WHERE
date = DATE_SUB(CURRENT_DATE(),INTERVAL 1 day)
AND covid19.county_fips_code != "00000"
ORDER BY
confirmed_cases_per_100000 desc
How do I calculate the number of new COVID-19 cases per day?
This query determines the total number of new cases in each state for each day available in the dataset
SELECT
b.state_name,
b.date,
MAX(b.confirmed_cases - a.confirmed_cases) AS daily_confirmed_cases
FROM
(SELECT
state_name AS state,
state_fips_code ,
confirmed_cases,
DATE_ADD(date, INTERVAL 1 day) AS date_shift
FROM
bigquery-public-data.covid19_nyt.us_states
WHERE
confirmed_cases + deaths > 0) a
JOIN
bigquery-public-data.covid19_nyt.us_states
b ON
a.state_fips_code = b.state_fips_code
AND a.date_shift = b.date
GROUP BY
b.state_name, date
ORDER BY
date desc
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States Google Search Trends: Government Measures: Government Subsidy data was reported at 0.000 Score in 14 May 2025. This stayed constant from the previous number of 0.000 Score for 13 May 2025. United States Google Search Trends: Government Measures: Government Subsidy data is updated daily, averaging 0.000 Score from Dec 2021 (Median) to 14 May 2025, with 1261 observations. The data reached an all-time high of 0.000 Score in 14 May 2025 and a record low of 0.000 Score in 14 May 2025. United States Google Search Trends: Government Measures: Government Subsidy data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s United States – Table US.Google.GT: Google Search Trends: by Categories.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Blockchain technology, first implemented by Satoshi Nakamoto in 2009 as a core component of Bitcoin, is a distributed, public ledger recording transactions. Its usage allows secure peer-to-peer communication by linking blocks containing hash pointers to a previous block, a timestamp, and transaction data. Bitcoin is a decentralized digital currency (cryptocurrency) which leverages the Blockchain to store transactions in a distributed manner in order to mitigate against flaws in the financial industry.
Nearly ten years after its inception, Bitcoin and other cryptocurrencies experienced an explosion in popular awareness. The value of Bitcoin, on the other hand, has experienced more volatility. Meanwhile, as use cases of Bitcoin and Blockchain grow, mature, and expand, hype and controversy have swirled.
In this dataset, you will have access to information about blockchain blocks and transactions. All historical data are in the bigquery-public-data:crypto_bitcoin
dataset. It’s updated it every 10 minutes. The data can be joined with historical prices in kernels. See available similar datasets here: https://www.kaggle.com/datasets?search=bitcoin.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_bitcoin.[TABLENAME]
. Fork this kernel to get started.
Allen Day (Twitter | Medium), Google Cloud Developer Advocate & Colin Bookman, Google Cloud Customer Engineer retrieve data from the Bitcoin network using a custom client available on GitHub that they built with the bitcoinj
Java library. Historical data from the origin block to 2018-01-31 were loaded in bulk to two BigQuery tables, blocks_raw and transactions. These tables contain fresh data, as they are now appended when new blocks are broadcast to the Bitcoin network. For additional information visit the Google Cloud Big Data and Machine Learning Blog post "Bitcoin in BigQuery: Blockchain analytics on public data".
Photo by Andre Francois on Unsplash.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A Selection From https://huggingface.co/datasets/songweig/imagenet_sketch (actually I download from https://opendatalab.com/OpenDataLab/ImageNet-Sketch and the hash is same)
The original ImageNet-Sketch data set consists of 50000 images, 50 images for each of the 1000 ImageNet classes. We construct the data set with Google Image queries "sketch of _", where _ is the standard class name. We only search within the "black and white" color scheme. We initially query 100 images for every class… See the full description on the dataset page: https://huggingface.co/datasets/tumuyan2/ImageNet-Sketch-HQ.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
BigQuery provides a limited number of sample tables that you can run queries against. These tables are suited for testing queries and learning BigQuery.
gsod: Contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010.
github_nested: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a nested schema. Created in September 2012.
github_timeline: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a flat schema. Created in May 2012.
natality: Describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008.
shakespeare: Contains a word index of the works of Shakespeare, giving the number of times each word appears in each corpus.
trigrams: Contains English language trigrams from a sample of works published between 1520 and 2008.
wikipedia: Contains the complete revision history for all Wikipedia articles up to April 2010.
Fork this kernel to get started.
Data Source: https://cloud.google.com/bigquery/sample-tables
Banner Photo by Mervyn Chan from Unplash.
How many babies were born in New York City on Christmas Day?
How many words are in the play Hamlet?
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Drive Stats
Drive Stats is a public data set of daily metrics on the hard drives in Backblaze’s cloud storage infrastructure that Backblaze has open-sourced since April 2013. Currently, Drive Stats comprises over 388 million records, rising by over 240,000 records per day. Drive Stats is an append-only dataset effectively logging daily statistics that once written are never updated or deleted. This is our first Hugging Face dataset; feel free to suggest improvements by creating a… See the full description on the dataset page: https://huggingface.co/datasets/backblaze/Drive_Stats.
Weather Source, a leading provider of weather and climate technologies for business intelligence, is offering complimentary data for those researching the potential connections between weather and COVID-19 viability and transmission. This share includes: Global historical weather data dating back to October 2019 Present data Forecast data out to 15 days The data supports temperature and humidity, both specific and relative, at the daily level. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This dataset is created and owned by Weather Source and made available for educational and academic research purposes. This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate.
The dataset of this paper is collected based on Google, Blockchain, and the Bitcoin market. Generally, there is a total of 26 features, however, a feature whose correlation rate is lower than 0.3 between the variations of price and the variations of feature has been eliminated. Hence, a total of 21 practical features including Market capitalization, Trade-volume, Transaction-fees USD, Average confirmation time, Difficulty, High price, Low price, Total hash rate, Block-size, Miners-revenue, N-transactions-total, Google searches, Open price, N-payments-per Block, Total circulating Bitcoin, Cost-per-transaction percent, Fees-USD-per transaction, N-unique-addresses, N-transactions-per block, and Output-volume have been selected. In addition to the values of these features, for each feature, a new one is created that includes the difference between the previous day and the day before the previous day as a supportive feature. From the point of view of the number and history of the dataset used, a total of 1275 training data were used in the proposed model to extract patterns of Bitcoin price and they were collected from 12 Nov 2018 to 4 Jun 2021.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries.
Over 9000 stations' data are typically available.
The daily elements included in the dataset (as available from each station) are: Mean temperature (.1 Fahrenheit) Mean dew point (.1 Fahrenheit) Mean sea level pressure (.1 mb) Mean station pressure (.1 mb) Mean visibility (.1 miles) Mean wind speed (.1 knots) Maximum sustained wind speed (.1 knots) Maximum wind gust (.1 knots) Maximum temperature (.1 Fahrenheit) Minimum temperature (.1 Fahrenheit) Precipitation amount (.01 inches) Snow depth (.1 inches)
Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]
. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.
This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and present, collected from over 9000 stations. Dataset Source: NOAA
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Photo by Allan Nygren on Unsplash
This dataset contains a set of daily time series representing the percentage changes of 6 aspects due to COVID-19: retail/recreation, grocery/pharmacy, parks, workplaces, residential and transit stations in a set of countries and regions. This file contains 559 daily time series which represent the average percentage changes of the above 6 aspects in 131 countries. The original dataset contains missing values and they have been replaced by zeros. {"references": ["Google 2021. COVID-19 Community Mobility Reports. Accessed: 2021-04-06. URL https://www.google.com/covid19/mobility/"]}
NOAA’s Global Historical Climatology Network (GHCN) is an integrated database of climate summaries from land surface stations across the globe that have been subjected to a common suite of quality assurance reviews. The data are obtained from more than 20 sources. Two GHCN datasets are available in BigQuery, the GHCN-D (daily) and the GHCN-M (monthly). The GHCN-Daily is an integrated database of daily climate summaries from land surface stations across the globe, and is comprised of daily climate records from over 100,000 stations in 180 countries and territories, and includes some data from every year since 1763. For a complete description of data variables available in this dataset, see NOAA’s GHCN-D readme . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses.
The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number, they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead conversion rate at X education is around 30%.
Now, although X Education gets a lot of leads, its lead conversion rate is very poor. For example, if, say, they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.
There are a lot of leads generated in the initial stage (top) but only a few of them come out as paying customers from the bottom. In the middle stage, you need to nurture the potential leads well (i.e. educating the leads about the product, constantly communicating, etc. ) in order to get a higher lead conversion.
X Education wants to select the most promising leads, i.e. the leads that are most likely to convert into paying customers. The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher lead score h have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%.
Variables Description
* Prospect ID - A unique ID with which the customer is identified.
* Lead Number - A lead number assigned to each lead procured.
* Lead Origin - The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc.
* Lead Source - The source of the lead. Includes Google, Organic Search, Olark Chat, etc.
* Do Not Email -An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not.
* Do Not Call - An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not.
* Converted - The target variable. Indicates whether a lead has been successfully converted or not.
* TotalVisits - The total number of visits made by the customer on the website.
* Total Time Spent on Website - The total time spent by the customer on the website.
* Page Views Per Visit - Average number of pages on the website viewed during the visits.
* Last Activity - Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc.
* Country - The country of the customer.
* Specialization - The industry domain in which the customer worked before. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form.
* How did you hear about X Education - The source from which the customer heard about X Education.
* What is your current occupation - Indicates whether the customer is a student, umemployed or employed.
* What matters most to you in choosing this course An option selected by the customer - indicating what is their main motto behind doing this course.
* Search - Indicating whether the customer had seen the ad in any of the listed items.
* Magazine
* Newspaper Article
* X Education Forums
* Newspaper
* Digital Advertisement
* Through Recommendations - Indicates whether the customer came in through recommendations.
* Receive More Updates About Our Courses - Indicates whether the customer chose to receive more updates about the courses.
* Tags - Tags assigned to customers indicating the current status of the lead.
* Lead Quality - Indicates the quality of lead based on the data and intuition the employee who has been assigned to the lead.
* Update me on Supply Chain Content - Indicates whether the customer wants updates on the Supply Chain Content.
* Get updates on DM Content - Indicates whether the customer wants updates on the DM Content.
* Lead Profile - A lead level assigned to each customer based on their profile.
* City - The city of the customer.
* Asymmetric Activity Index - An index and score assigned to each customer based on their activity and their profile
* Asymmetric Profile Index
* Asymmetric Activity Score
* Asymmetric Profile Score
* I agree to pay the amount through cheque - Indicates whether the customer has agreed to pay the amount through cheque or not.
* a free copy of Mastering The Interview - Indicates whether the customer wants a free copy of 'Mastering the Interview' or not.
* Last Notable Activity - The last notable activity performed by the student.
UpGrad Case Study
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
This dataset contains cloud-to-ground lightning strike information collected by Vaisala's National Lightning Detection Network and aggregated into 0.1 x 0.1 degree tiles by the experts at the National Centers for Environmental Information (NCEI) as part of their Severe Weather Data Inventory. This data provides historical cloud-to-ground data aggregated into tiles that around roughly 11 KMs for redistribution. This provides users with the number of lightning strikes each day, as well as the center point for each tile. The sample queries below will help you get started using BigQuery's GIS capabilities to analyze the data. For more on BigQuery GIS, see the documentation available here. The data begins in 1987 and runs through current day, with a delay of a few days for processing. For near real-time lightning information, see the Cloud Public Data's metadata listing of GOES-16 data for cloud-to-cloud and cloud-to-ground strikes over the eastern half of the western hemisphere. GOES-17 data covering the western half of the western hemisphere will be available soon. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tanzania Google Search Trends: Travel & Accommodations: Booking.com data was reported at 8.000 Score in 15 May 2025. This records an increase from the previous number of 6.000 Score for 14 May 2025. Tanzania Google Search Trends: Travel & Accommodations: Booking.com data is updated daily, averaging 0.000 Score from Dec 2021 (Median) to 15 May 2025, with 1262 observations. The data reached an all-time high of 82.000 Score in 04 Aug 2022 and a record low of 0.000 Score in 03 May 2025. Tanzania Google Search Trends: Travel & Accommodations: Booking.com data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s Tanzania – Table TZ.Google.GT: Google Search Trends: by Categories.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cambodia Google Search Trends: Computer & Electronics: Apple data was reported at 30.000 Score in 14 May 2025. This records a decrease from the previous number of 36.000 Score for 13 May 2025. Cambodia Google Search Trends: Computer & Electronics: Apple data is updated daily, averaging 28.000 Score from Dec 2021 (Median) to 14 May 2025, with 1261 observations. The data reached an all-time high of 100.000 Score in 10 Sep 2024 and a record low of 0.000 Score in 06 Mar 2023. Cambodia Google Search Trends: Computer & Electronics: Apple data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s Cambodia – Table KH.Google.GT: Google Search Trends: by Categories.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sri Lanka Google Search Trends: Online Games: Call of Duty data was reported at 0.000 Score in 14 May 2025. This stayed constant from the previous number of 0.000 Score for 13 May 2025. Sri Lanka Google Search Trends: Online Games: Call of Duty data is updated daily, averaging 0.000 Score from Dec 2021 (Median) to 14 May 2025, with 1261 observations. The data reached an all-time high of 21.000 Score in 01 Dec 2021 and a record low of 0.000 Score in 14 May 2025. Sri Lanka Google Search Trends: Online Games: Call of Duty data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s Sri Lanka – Table LK.Google.GT: Google Search Trends: by Categories.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Google Search Trends: Online Shopping: Tmall data was reported at 0.000 Score in 14 May 2025. This stayed constant from the previous number of 0.000 Score for 13 May 2025. Google Search Trends: Online Shopping: Tmall data is updated daily, averaging 0.000 Score from Dec 2021 (Median) to 14 May 2025, with 1261 observations. The data reached an all-time high of 31.000 Score in 10 Feb 2023 and a record low of 0.000 Score in 14 May 2025. Google Search Trends: Online Shopping: Tmall data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s Nigeria – Table NG.Google.GT: Google Search Trends: by Categories.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for a broad set of health symptoms, signs, and conditions. The dataset provides a daily or weekly time series for each region showing the relative volume of searches for each symptom. This dataset is intended to help researchers to better understand the impact of COVID-19. It shouldn't be used for medical diagnostic, prognostic, or treatment purposes. It also isn't intended to be used for guidance on personal travel plans. To learn more about the dataset, how we generate it and preserve privacy, read the data documentation . To visualize the data, try exploring these interactive charts and map of symptom search trends . As of Dec. 15, 2020, the dataset was expanded to include trends for Australia, Ireland, New Zealand, Singapore, and the United Kingdom. This expanded data is available in new tables that provide data at country and two subregional levels. We will not be updating existing state/county tables going forward. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .