14 datasets found

Wordle Answer Search Trends Dataset (2021–2025)

kaggle.com

Updated Jun 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ankush Kamboj (2025). Wordle Answer Search Trends Dataset (2021–2025) [Dataset]. https://www.kaggle.com/datasets/kambojankush/wordle-answer-search-trends-dataset-20212025/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 26, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ankush Kamboj

License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

This dataset investigates the relationship between Wordle answers and Google search spikes, particularly for uncommon words. It spans from June 21, 2021 to June 24, 2025.

It includes daily data for each Wordle answer, its search trend on that day, and frequency-based commonality indicators.

🔍 Hypothesis

Each Wordle answer causes a spike in search volume on the day it appears — more so if the word is rare.

This dataset supports exploration of:

Wordle Answers
Trends for wordle answers
Correlation between wordle answer rarity and search interest

Columns

Column	Description
`date`	Date of the Wordle puzzle
`word`	Correct 5-letter Wordle answer
`game`	Wordle game number
`wordfreq_commonality`	Normalized frequency score using Python’s `wordfreq` library
`subtlex_commonality`	Normalized frequency score using SUBTLEX-US dataset
`trend_day_global`	Google search interest on the day (global, all categories)
`trend_avg_200_global`	200-day average search interest (global, all categories)
`trend_day_language`	Search interest on Wordle day (Language Resources category)
`trend_avg_200_language`	200-day average search interest (Language Resources category)

Notes: - All trend values are relative (0–100 scale, per Google Trends)

🧮 Methodology

Wordle answers were scraped from wordfinder.yourdictionary.com
Commonality scores were computed using:
- wordfreq Python library
- SUBTLEX-US dataset (subtitle frequency, approximating spoken English)
Trend data was fetched using Google Trends API via pytrends

📊 Analysis

Can find analysis done using this data in the blog post

Drive_Stats
huggingface.co
Updated Apr 10, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Backblaze (2013). Drive_Stats [Dataset]. https://huggingface.co/datasets/backblaze/Drive_Stats
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 10, 2013
Dataset provided by
Backblaze
Backblazehttp://backblaze.com/
Authors
Backblaze
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Drive Stats

Drive Stats is a public data set of daily metrics on the hard drives in Backblaze’s cloud storage infrastructure that Backblaze has open-sourced since April 2013. Currently, Drive Stats comprises over 388 million records, rising by over 240,000 records per day. Drive Stats is an append-only dataset effectively logging daily statistics that once written are never updated or deleted. This is our first Hugging Face dataset; feel free to suggest improvements by creating a… See the full description on the dataset page: https://huggingface.co/datasets/backblaze/Drive_Stats.
d
MLP-based Learnable Window Size Dataset for Bitcoin Market Price
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajabi, Shahab (2023). MLP-based Learnable Window Size Dataset for Bitcoin Market Price [Dataset]. http://doi.org/10.7910/DVN/5YBLKV
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/5YBLKV
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Rajabi, Shahab
Description
The dataset of this paper is collected based on Google, Blockchain, and the Bitcoin market. Generally, there is a total of 26 features, however, a feature whose correlation rate is lower than 0.3 between the variations of price and the variations of feature has been eliminated. Hence, a total of 21 practical features including Market capitalization, Trade-volume, Transaction-fees USD, Average confirmation time, Difficulty, High price, Low price, Total hash rate, Block-size, Miners-revenue, N-transactions-total, Google searches, Open price, N-payments-per Block, Total circulating Bitcoin, Cost-per-transaction percent, Fees-USD-per transaction, N-unique-addresses, N-transactions-per block, and Output-volume have been selected. In addition to the values of these features, for each feature, a new one is created that includes the difference between the previous day and the day before the previous day as a supportive feature. From the point of view of the number and history of the dataset used, a total of 1275 training data were used in the proposed model to extract patterns of Bitcoin price and they were collected from 12 Nov 2018 to 4 Jun 2021.
COVID19 - The New York Times
kaggle.com
zip
Updated May 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). COVID19 - The New York Times [Dataset]. https://www.kaggle.com/bigquery/covid19-nyt
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies. More information on the data repository is available here . For additional reporting and data visualizations, see The New York Times’ U.S. coronavirus interactive site

Sample Queries

Query 1

Which US counties have the most confirmed cases per capita? This query determines which counties have the most cases per 100,000 residents. Note that this may differ from similar queries of other datasets because of differences in reporting lag, methodologies, or other dataset differences.

SELECT covid19.county, covid19.state_name, total_pop AS county_population, confirmed_cases, ROUND(confirmed_cases/total_pop *100000,2) AS confirmed_cases_per_100000, deaths, ROUND(deaths/total_pop *100000,2) AS deaths_per_100000 FROM bigquery-public-data.covid19_nyt.us_counties covid19 JOIN bigquery-public-data.census_bureau_acs.county_2017_5yr acs ON covid19.county_fips_code = acs.geo_id WHERE date = DATE_SUB(CURRENT_DATE(),INTERVAL 1 day) AND covid19.county_fips_code != "00000" ORDER BY confirmed_cases_per_100000 desc

Query 2

How do I calculate the number of new COVID-19 cases per day? This query determines the total number of new cases in each state for each day available in the dataset SELECT b.state_name, b.date, MAX(b.confirmed_cases - a.confirmed_cases) AS daily_confirmed_cases FROM (SELECT state_name AS state, state_fips_code , confirmed_cases, DATE_ADD(date, INTERVAL 1 day) AS date_shift FROM bigquery-public-data.covid19_nyt.us_states WHERE confirmed_cases + deaths > 0) a JOIN bigquery-public-data.covid19_nyt.us_states b ON a.state_fips_code = b.state_fips_code AND a.date_shift = b.date GROUP BY b.state_name, date ORDER BY date desc
Data (i.e., evidence) about evidence based medicine
figshare.com
search.datacite.org
png
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge H Ramirez (2023). Data (i.e., evidence) about evidence based medicine [Dataset]. http://doi.org/10.6084/m9.figshare.1093997.v24
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1093997.v24
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jorge H Ramirez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Update — December 7, 2014. – Evidence-based medicine (EBM) is not working for many reasons, for example: 1. Incorrect in their foundations (paradox): hierarchical levels of evidence are supported by opinions (i.e., lowest strength of evidence according to EBM) instead of real data collected from different types of study designs (i.e., evidence). http://dx.doi.org/10.6084/m9.figshare.1122534 2. The effect of criminal practices by pharmaceutical companies is only possible because of the complicity of others: healthcare systems, professional associations, governmental and academic institutions. Pharmaceutical companies also corrupt at the personal level, politicians and political parties are on their payroll, medical professionals seduced by different types of gifts in exchange of prescriptions (i.e., bribery) which very likely results in patients not receiving the proper treatment for their disease, many times there is no such thing: healthy persons not needing pharmacological treatments of any kind are constantly misdiagnosed and treated with unnecessary drugs. Some medical professionals are converted in K.O.L. which is only a puppet appearing on stage to spread lies to their peers, a person supposedly trained to improve the well-being of others, now deceits on behalf of pharmaceutical companies. Probably the saddest thing is that many honest doctors are being misled by these lies created by the rules of pharmaceutical marketing instead of scientific, medical, and ethical principles. Interpretation of EBM in this context was not anticipated by their creators. “The main reason we take so many drugs is that drug companies don’t sell drugs, they sell lies about drugs.” ―Peter C. Gøtzsche “doctors and their organisations should recognise that it is unethical to receive money that has been earned in part through crimes that have harmed those people whose interests doctors are expected to take care of. Many crimes would be impossible to carry out if doctors weren’t willing to participate in them.” —Peter C Gøtzsche, The BMJ, 2012, Big pharma often commits corporate crime, and this must be stopped. Pending (Colombia): Health Promoter Entities (In Spanish: EPS ―Empresas Promotoras de Salud).

Misinterpretations New technologies or concepts are difficult to understand in the beginning, it doesn’t matter their simplicity, we need to get used to new tools aimed to improve our professional practice. Probably the best explanation is here in these videos (credits to Antonio Villafaina for sharing these videos with me). English https://www.youtube.com/watch?v=pQHX-SjgQvQ&w=420&h=315 Spanish https://www.youtube.com/watch?v=DApozQBrlhU&w=420&h=315 ----------------------- Hypothesis: hierarchical levels of evidence based medicine are wrong Dear Editor, I have data to support the hypothesis described in the title of this letter. Before rejecting the null hypothesis I would like to ask the following open question:Could you support with data that hierarchical levels of evidence based medicine are correct? (1,2) Additional explanation to this question: – Only respond to this question attaching publicly available raw data.– Be aware that more than a question this is a challenge: I have data (i.e., evidence) which is contrary to classic (i.e., McMaster) or current (i.e., Oxford) hierarchical levels of evidence based medicine. An important part of this data (but not all) is publicly available. References

Ramirez, Jorge H (2014): The EBM challenge. figshare. http://dx.doi.org/10.6084/m9.figshare.1135873

The EBM Challenge Day 1: No Answers. Competing interests: I endorse the principles of open data in human biomedical research Read this letter on The BMJ – August 13, 2014.http://www.bmj.com/content/348/bmj.g3725/rr/762595Re: Greenhalgh T, et al. Evidence based medicine: a movement in crisis? BMJ 2014; 348: g3725. _ Fileset contents Raw data: Excel archive: Raw data, interactive figures, and PubMed search terms. Google Spreadsheet is also available (URL below the article description). Figure 1. Unadjusted (Fig 1A) and adjusted (Fig 1B) PubMed publication trends (01/01/1992 to 30/06/2014). Figure 2. Adjusted PubMed publication trends (07/01/2008 to 29/06/2014) Figure 3. Google search trends: Jan 2004 to Jun 2014 / 1-week periods. Figure 4. PubMed publication trends (1962-2013) systematic reviews and meta-analysis, clinical trials, and observational studies.
Figure 5. Ramirez, Jorge H (2014): Infographics: Unpublished US phase 3 clinical trials (2002-2014) completed before Jan 2011 = 50.8%. figshare.http://dx.doi.org/10.6084/m9.figshare.1121675 Raw data: "13377 studies found for: Completed | Interventional Studies | Phase 3 | received from 01/01/2002 to 01/01/2014 | Worldwide". This database complies with the terms and conditions of ClinicalTrials.gov: http://clinicaltrials.gov/ct2/about-site/terms-conditions Supplementary Figures (S1-S6). PubMed publication delay in the indexation processes does not explain the descending trends in the scientific output of evidence-based medicine. Acknowledgments I would like to acknowledge the following persons for providing valuable concepts in data visualization and infographics:

Maria Fernanda Ramírez. Professor of graphic design. Universidad del Valle. Cali, Colombia.

Lorena Franco. Graphic design student. Universidad del Valle. Cali, Colombia. Related articles by this author (Jorge H. Ramírez)

Ramirez JH. Lack of transparency in clinical trials: a call for action. Colomb Med (Cali) 2013;44(4):243-6. URL: http://www.ncbi.nlm.nih.gov/pubmed/24892242

Ramirez JH. Re: Evidence based medicine is broken (17 June 2014). http://www.bmj.com/node/759181

Ramirez JH. Re: Global rules for global health: why we need an independent, impartial WHO (19 June 2014). http://www.bmj.com/node/759151

Ramirez JH. PubMed publication trends (1992 to 2014): evidence based medicine and clinical practice guidelines (04 July 2014). http://www.bmj.com/content/348/bmj.g3725/rr/759895 Recommended articles

Greenhalgh Trisha, Howick Jeremy,Maskrey Neal. Evidence based medicine: a movement in crisis? BMJ 2014;348:g3725

Spence Des. Evidence based medicine is broken BMJ 2014; 348:g22

Schünemann Holger J, Oxman Andrew D,Brozek Jan, Glasziou Paul, JaeschkeRoman, Vist Gunn E et al. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies BMJ 2008; 336:1106

Lau Joseph, Ioannidis John P A, TerrinNorma, Schmid Christopher H, OlkinIngram. The case of the misleading funnel plot BMJ 2006; 333:597

Moynihan R, Henry D, Moons KGM (2014) Using Evidence to Combat Overdiagnosis and Overtreatment: Evaluating Treatments, Tests, and Disease Definitions in the Time of Too Much. PLoS Med 11(7): e1001655. doi:10.1371/journal.pmed.1001655

Katz D. A-holistic view of evidence based medicinehttp://thehealthcareblog.com/blog/2014/05/02/a-holistic-view-of-evidence-based-medicine/ ---
Google energy consumption 2011-2023
statista.com
ai-chatbox.pro
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Google energy consumption 2011-2023 [Dataset]. https://www.statista.com/statistics/788540/energy-consumption-of-google/
Explore at:
Dataset updated
Oct 11, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Google’s energy consumption has increased over the last few years, reaching 25.9 terawatt hours in 2023, up from 12.8 terawatt hours in 2019. The company has made efforts to make its data centers more efficient through customized high-performance servers, using smart temperature and lighting, advanced cooling techniques, and machine learning. Datacenters and energy Through its operations, Google pursues a more sustainable impact on the environment by creating efficient data centers that use less energy than the average, transitioning towards renewable energy, creating sustainable workplaces, and providing its users with the technological means towards a cleaner future for the future generations. Through its efficient data centers, Google has also managed to divert waste from its operations away from landfills. Reducing Google’s carbon footprint Google’s clean energy efforts is also related to their efforts to reduce their carbon footprint. Since their commitment to using 100 percent renewable energy, the company has met their targets largely through solar and wind energy power purchase agreements and buying renewable power from utilities. Google is one of the largest corporate purchasers of renewable energy in the world.
H
Capturing the Aftermath of the Dobbs v. Jackson Decision in the Google...
dataverse.harvard.edu
Updated Jan 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brooke Perreault; Anya Wintner; Lan Dau; Eni Mustafaraj (2023). Capturing the Aftermath of the Dobbs v. Jackson Decision in the Google Search Results across 65 U.S. Locations [Dataset]. http://doi.org/10.7910/DVN/YFAH9X
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/YFAH9X
Dataset updated
Jan 17, 2023
Dataset provided by
Harvard Dataverse
Authors
Brooke Perreault; Anya Wintner; Lan Dau; Eni Mustafaraj
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dataset for the paper "Capturing the Aftermath of the Dobbs v. Jackson Decision in the Google Search Results across 65 U.S. Locations" to appear in the proceedings of ICWSM 2023. Starting on the day of the U.S Supreme Court decision to overturn Roe v. Wade, we collected Google Search result pages for 21 days in 65 U.S. locations for a set of almost 1,700 queries. We stored all the SERPs generated by Google. Because the archives containing these SERPs are much larger than the file limits of the Harvard Dataverse, you can find them at this address: https://cs.wellesley.edu/~credlab/icwsm2023/. Instead, in this repository we will share all the files that were created by parsing some of the information in the SERPs: organic search results, top stories, and embedded tweets. We also provide aggregated statistics for the domains appearing in the organic results and the top stories. This dataset can be useful for answering questions about Google Search's algorithms with respect to shaping access to information related to important news events.
Ethereum Cryptocurrency
console.cloud.google.com
Updated Apr 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Ethereum&hl=ES&inv=1&invt=Ab2tpQ (2023). Ethereum Cryptocurrency [Dataset]. https://console.cloud.google.com/marketplace/product/ethereum/crypto-ethereum-blockchain?hl=ES
Explore at:
Dataset updated
Apr 22, 2023
Dataset provided by
Googlehttp://google.com/
Description
Ethereum is a crypto currency which leverages blockchain technology to store transactions in a distributed ledger. A blockchain is an ever-growing "tree" of blocks, where each block contains a number of transactions. To learn more, read the "Ethereum in BigQuery: a Public Dataset for smart contract analytics" blog post by Google Developer Advocate Allen Day. This dataset is part of a larger effort to make cryptocurrency data available in BigQuery through the Google Cloud Public Datasets program . The program is hosting several cryptocurrency datasets, with plans to both expand offerings to include additional cryptocurrencies and reduce the latency of updates. You can find these datasets by searching "cryptocurrency" in GCP Marketplace. For analytics interoperability, we designed a unified schema that allows all Bitcoin-like datasets to share queries. Interested in learning more about how the data from these blockchains were brought into BigQuery? Looking for more ways to analyze the data? Check out the Google Cloud Big Data blog post and try the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Data from: Examining bias perpetuation in academic search engines: an...
zenodo.org
bin, csv, zip
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ulloa Roberto; Ulloa Roberto (2024). Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar [Dataset]. http://doi.org/10.5281/zenodo.10636247
Explore at:
bin, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10636247
Dataset updated
Feb 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ulloa Roberto; Ulloa Roberto
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Main dataset (main.csv)

The main file contains an entry (N=28530) per search result in all collected pages. It comprises the following columns:

id: Unique identifier of the file (corresponds to the last part of the filename)

filename: Name of the file associated with the row (the file is in serp_html.zip)

engine: The search engine used (Google Scholar or Semantic Scholar).

browser: The web browser used for the search (Firefox or Chrome)

region: The geographical region where the search was made.

year: The year when the search was made

month: The month when the search was made

day: The day when the search was made

query: The full search query that was used

query_type: The type of the search query (health or technology)

topic: The topic associated with the search query ('covid vaccines', 'cryptocurrencies', 'internet', 'social media', 'vaccines', 'coffee')

trt: Treatment variable associated with the search (benefits or risks).

url: The URL of the (article) search result

title: The title of the (article) search result.

authorship: The author(s) of the (article) search result.

abstract_id: Unique identifier for the abstract of the (article) search result which connects with annotated-abstracts_v0.6.xlsx

abstract_hash: Hash value of the abstract for data integrity

link_n: The total number of results in the search page

rank: The rank of the search result on the search engine results page.

annotation: Any annotations associated with the (article's abstract) search result. One of: '3. Confirms both benefits and risks', '4. Confirms neither benefits nor risks', '1. Confirms benefits', '2. Confirms risks', '5. Abstract not related to {topic}')

valence: -1 for abstracts containing risks, 0 for neutral abstracts, 1 for abstracts only containing benefits

Annotated abstracts (annotated-abstracts_v0.6.xlsx)

Manually annotated abstracts resulting from the searches.

Raw search engine result pages (serp_html.zip)

The zip contains an HTML per search engine result page collected (N=2853). See column filename from the main dataset.
Bitcoin Blockchain Historical Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Bitcoin Blockchain Historical Data [Dataset]. https://www.kaggle.com/bigquery/bitcoin-blockchain
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Blockchain technology, first implemented by Satoshi Nakamoto in 2009 as a core component of Bitcoin, is a distributed, public ledger recording transactions. Its usage allows secure peer-to-peer communication by linking blocks containing hash pointers to a previous block, a timestamp, and transaction data. Bitcoin is a decentralized digital currency (cryptocurrency) which leverages the Blockchain to store transactions in a distributed manner in order to mitigate against flaws in the financial industry.

Nearly ten years after its inception, Bitcoin and other cryptocurrencies experienced an explosion in popular awareness. The value of Bitcoin, on the other hand, has experienced more volatility. Meanwhile, as use cases of Bitcoin and Blockchain grow, mature, and expand, hype and controversy have swirled.

Content

In this dataset, you will have access to information about blockchain blocks and transactions. All historical data are in the bigquery-public-data:crypto_bitcoin dataset. It’s updated it every 10 minutes. The data can be joined with historical prices in kernels. See available similar datasets here: https://www.kaggle.com/datasets?search=bitcoin.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_bitcoin.[TABLENAME]. Fork this kernel to get started.

Method & Acknowledgements

Allen Day (Twitter | Medium), Google Cloud Developer Advocate & Colin Bookman, Google Cloud Customer Engineer retrieve data from the Bitcoin network using a custom client available on GitHub that they built with the bitcoinj Java library. Historical data from the origin block to 2018-01-31 were loaded in bulk to two BigQuery tables, blocks_raw and transactions. These tables contain fresh data, as they are now appended when new blocks are broadcast to the Bitcoin network. For additional information visit the Google Cloud Big Data and Machine Learning Blog post "Bitcoin in BigQuery: Blockchain analytics on public data".

Photo by Andre Francois on Unsplash.

Inspiration

How many bitcoins are sent each day?

How many addresses receive bitcoin each day?

Compare transaction volume to historical prices by joining with other available data sources
Zilliqa Cryptocurrency
console.cloud.google.com
Updated Mar 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Cloud%20Public%20Datasets%20-%20Finance&inv=1&invt=Ab3jKQ (2021). Zilliqa Cryptocurrency [Dataset]. https://console.cloud.google.com/marketplace/product/public-data-finance/crypto-zilliqa-dataset
Explore at:
Dataset updated
Mar 5, 2021
Dataset provided by
Googlehttp://google.com/
Description
Zilliqa is a blockchain platform designed around the concept of sharding. Sharding means dividing the network into several smaller component networks that are able to process transactions in parallel. Cryptocurrency markets are becoming more accessible and analysis is increasing by the day. Gone will be the days when investors jump into crypto to become overnight millionaires and it is the reason some ICOs are doing well while others have been losing value since January. Zilliqa platform is among the few projects that seem to be gaining favor from different facets in the financial services sector. Zilliqa aims at maximizing scalability within blockchain tech. The platform has been developed using sharding tech in order to interlink more networks. This dataset is part of a larger effort to make cryptocurrency data available in BigQuery through the Google Cloud Public Datasets program. The program is hosting several cryptocurrency datasets, with plans to both expand offerings to include additional cryptocurrencies and reduce the latency of updates. You can find these datasets by searching "cryptocurrency" in GCP Marketplace. For analytics interoperability, we designed a unified schema that allows all Bitcoin-like datasets to share queries. Interested in learning more about how the data from these blockchains were brought into BigQuery? Looking for more ways to analyze the data? Check out the Google Cloud Big Data blog post and try the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using
Lead Scoring Dataset
kaggle.com
zip
Updated Aug 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amrita Chatterjee (2020). Lead Scoring Dataset [Dataset]. https://www.kaggle.com/amritachatterjee09/lead-scoring-dataset
Explore at:
zip(411028 bytes)Available download formats
Dataset updated
Aug 17, 2020
Authors
Amrita Chatterjee
Description
Context

An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses.

The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number, they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead conversion rate at X education is around 30%.

Now, although X Education gets a lot of leads, its lead conversion rate is very poor. For example, if, say, they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.

There are a lot of leads generated in the initial stage (top) but only a few of them come out as paying customers from the bottom. In the middle stage, you need to nurture the potential leads well (i.e. educating the leads about the product, constantly communicating, etc. ) in order to get a higher lead conversion.

X Education wants to select the most promising leads, i.e. the leads that are most likely to convert into paying customers. The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher lead score h have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%.

Content

Variables Description * Prospect ID - A unique ID with which the customer is identified. * Lead Number - A lead number assigned to each lead procured. * Lead Origin - The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc. * Lead Source - The source of the lead. Includes Google, Organic Search, Olark Chat, etc. * Do Not Email -An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not. * Do Not Call - An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not. * Converted - The target variable. Indicates whether a lead has been successfully converted or not. * TotalVisits - The total number of visits made by the customer on the website. * Total Time Spent on Website - The total time spent by the customer on the website. * Page Views Per Visit - Average number of pages on the website viewed during the visits. * Last Activity - Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc. * Country - The country of the customer. * Specialization - The industry domain in which the customer worked before. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form. * How did you hear about X Education - The source from which the customer heard about X Education. * What is your current occupation - Indicates whether the customer is a student, umemployed or employed. * What matters most to you in choosing this course An option selected by the customer - indicating what is their main motto behind doing this course. * Search - Indicating whether the customer had seen the ad in any of the listed items. * Magazine
* Newspaper Article * X Education Forums
* Newspaper * Digital Advertisement * Through Recommendations - Indicates whether the customer came in through recommendations. * Receive More Updates About Our Courses - Indicates whether the customer chose to receive more updates about the courses. * Tags - Tags assigned to customers indicating the current status of the lead. * Lead Quality - Indicates the quality of lead based on the data and intuition the employee who has been assigned to the lead. * Update me on Supply Chain Content - Indicates whether the customer wants updates on the Supply Chain Content. * Get updates on DM Content - Indicates whether the customer wants updates on the DM Content. * Lead Profile - A lead level assigned to each customer based on their profile. * City - The city of the customer. * Asymmetric Activity Index - An index and score assigned to each customer based on their activity and their profile * Asymmetric Profile Index * Asymmetric Activity Score * Asymmetric Profile Score
* I agree to pay the amount through cheque - Indicates whether the customer has agreed to pay the amount through cheque or not. * a free copy of Mastering The Interview - Indicates whether the customer wants a free copy of 'Mastering the Interview' or not. * Last Notable Activity - The last notable activity performed by the student.

Acknowledgements

UpGrad Case Study

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Keywords for Lawyers
link-assistant.com
xlsx
Updated May 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SEO PowerSuite (2023). Keywords for Lawyers [Dataset]. https://www.link-assistant.com/news/keywords-for-lawyers.html
Explore at:
xlsxAvailable download formats
Dataset updated
May 13, 2023
Dataset provided by
Authors
SEO PowerSuite
Description
A dataset of keywords that are relevant to lawyers, including their definitions, synonyms, antonyms, search volume and costs.
f
Dataset for Multivariate Bitcoin Price Forecasting.
figshare.com
txt
Updated Apr 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anny Mardjo; Chidchanok Choksuchat (2023). Dataset for Multivariate Bitcoin Price Forecasting. [Dataset]. http://doi.org/10.6084/m9.figshare.22678540.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22678540.v1
Dataset updated
Apr 22, 2023
Dataset provided by
figshare
Authors
Anny Mardjo; Chidchanok Choksuchat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset was collected for the period spanning between 01/07/2019 and 31/12/2022.The historical Twitter volume were retrieved using ‘‘Bitcoin’’ (case insensitive) as the keyword from bitinfocharts.com. Google search volume was retrieved using library Gtrends. 2000 tweets per day using 4 times interval were crawled by employing Twitter API with the keyword “Bitcoin. The daily closing prices of Bitcoin, oil price, gold price, and U.S stock market indexes (S&P 500, NASDAQ, and Dow Jones Industrial Average) were collected using R libraries either Quantmod or Quandl.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ankush Kamboj (2025). Wordle Answer Search Trends Dataset (2021–2025) [Dataset]. https://www.kaggle.com/datasets/kambojankush/wordle-answer-search-trends-dataset-20212025/discussion

Wordle Answer Search Trends Dataset (2021–2025)

It includes daily data for each Wordle answer, its search trend on that day, and

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 26, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ankush Kamboj

License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

This dataset investigates the relationship between Wordle answers and Google search spikes, particularly for uncommon words. It spans from June 21, 2021 to June 24, 2025.

It includes daily data for each Wordle answer, its search trend on that day, and frequency-based commonality indicators.

🔍 Hypothesis

Each Wordle answer causes a spike in search volume on the day it appears — more so if the word is rare.

This dataset supports exploration of:

Wordle Answers
Trends for wordle answers
Correlation between wordle answer rarity and search interest

Columns

Column	Description
`date`	Date of the Wordle puzzle
`word`	Correct 5-letter Wordle answer
`game`	Wordle game number
`wordfreq_commonality`	Normalized frequency score using Python’s `wordfreq` library
`subtlex_commonality`	Normalized frequency score using SUBTLEX-US dataset
`trend_day_global`	Google search interest on the day (global, all categories)
`trend_avg_200_global`	200-day average search interest (global, all categories)
`trend_day_language`	Search interest on Wordle day (Language Resources category)
`trend_avg_200_language`	200-day average search interest (Language Resources category)

Notes: - All trend values are relative (0–100 scale, per Google Trends)

🧮 Methodology

Wordle answers were scraped from wordfinder.yourdictionary.com
Commonality scores were computed using:
- wordfreq Python library
- SUBTLEX-US dataset (subtitle frequency, approximating spoken English)
Trend data was fetched using Google Trends API via pytrends

📊 Analysis

Can find analysis done using this data in the blog post

Clear search

Close search

Google apps

Main menu

Wordle Answer Search Trends Dataset (2021–2025)

🔍 Hypothesis

Columns

🧮 Methodology

📊 Analysis

Drive_Stats

MLP-based Learnable Window Size Dataset for Bitcoin Market Price

COVID19 - The New York Times

Context

Sample Queries

Query 1

Query 2

Data (i.e., evidence) about evidence based medicine

Google energy consumption 2011-2023

Capturing the Aftermath of the Dobbs v. Jackson Decision in the Google...

Ethereum Cryptocurrency

Data from: Examining bias perpetuation in academic search engines: an...

Main dataset (main.csv)

Annotated abstracts (annotated-abstracts_v0.6.xlsx)

Raw search engine result pages (serp_html.zip)

Bitcoin Blockchain Historical Data

Context

Content

Querying BigQuery tables

Method & Acknowledgements

Inspiration

Zilliqa Cryptocurrency

Lead Scoring Dataset

Context

Content

Acknowledgements

Inspiration

Keywords for Lawyers

Dataset for Multivariate Bitcoin Price Forecasting.

Wordle Answer Search Trends Dataset (2021–2025)

It includes daily data for each Wordle answer, its search trend on that day, and

🔍 Hypothesis

Columns

🧮 Methodology

📊 Analysis