Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
When we think about the Internet, we quickly think of Internet search engines that enable efficient and precise use of resources located on www services. In the Western world, the search engine market has been dominated by "Google" for years, which does not mean that it has the entire market. In this database, we will look at the market shares of various search engines over the last 16 years.
The database saved in .csv form contains 28 columns. The first column contains the date (YYYY-MM) from the measurement period. Each subsequent column contains the percentage of search engine, given as a percentage, rounded to 2 decimal places (if the share is less than 0.005%, the value 0 remains, even though it may constitute a very small percentage of the share). We have a total of 191 rows, i.e. almost 16 years of data for each month since January 2009.
The database comes from the Statcounter and is made available in the operation with CC BY-SA 3.0 license which allows to copy, use and disseminate data also for commercial purposes after providing the source.
Photo by Duncan Meyer on Unsplash
Facebook
TwitterIn August 2025, Google.com was the most visited website worldwide, with an average of 98.2 billion monthly visits. The platform has maintained its leading position since June 2010, when it surpassed Yahoo to take first place. YouTube ranked second during the same period, recording over 48 billion monthly visits. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.
Facebook
Twitterhttps://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy
Facebook
Twittergrok-4-fast-search by xAI achieved the highest rank in the LMArena's web search category in November 2025 with ***** points. The ranking evaluates the various AI models in terms of their web search for real-time information, external knowledge, and grounded citations. Second place went to the Perplexity model ppl-sonar-pro-high.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset consists of the top 50 most visited websites in the world, as well as the category and principal country/territory for each site. The data provides insights into which sites are most popular globally, and what type of content is most popular in different parts of the world
This dataset can be used to track the most popular websites in the world over time. It can also be used to compare website popularity between different countries and categories
- To track the most popular websites in the world over time
- To see how website popularity changes by region
- To find out which website categories are most popular
Dataset by Alexa Internet, Inc. (2019), released on Kaggle under the Open Data Commons Public Domain Dedication and License (ODC-PDDL)
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: df_1.csv | Column name | Description | |:--------------------------------|:---------------------------------------------------------------------| | Site | The name of the website. (String) | | Domain Name | The domain name of the website. (String) | | Category | The category of the website. (String) | | Principal country/territory | The principal country/territory where the website is based. (String) |
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Tracks search interest over time, showing peaks and troughs in popularity for specific keywords.
Provides data on search trends by location, allowing for geographic comparisons of interest.
Lists associated search terms, highlighting related topics that are frequently searched alongside the primary keywords.
Distinguishes between the most popular queries and those with a sharp increase in search volume.
Organizes search data by category, enabling focused insights into specific industries, interests, or demographic groups.
Offers access to both historical and real-time data, ideal for identifying ongoing or emerging trends.
Facebook
TwitterAs of October 2025, Google represented ***** percent of the global online search engine referrals on desktop devices. Despite being much ahead of its competitors, this represents a modest increase from the previous months. Meanwhile, its longtime competitor Bing accounted for ***** percent, as tools like Yahoo and Yandex held shares of over **** percent and **** percent respectively. Google and the global search market Ever since the introduction of Google Search in 1997, the company has dominated the search engine market, while the shares of all other tools has been rather lopsided. The majority of Google revenues are generated through advertising. Its parent corporation, Alphabet, was one of the biggest internet companies worldwide as of 2024, with a market capitalization of **** trillion U.S. dollars. The company has also expanded its services to mail, productivity tools, enterprise products, mobile devices, and other ventures. As a result, Google earned one of the highest tech company revenues in 2024 with roughly ****** billion U.S. dollars. Search engine usage in different countries Google is the most frequently used search engine worldwide. But in some countries, its alternatives are leading or competing with it to some extent. As of the last quarter of 2023, more than ** percent of internet users in Russia used Yandex, whereas Google users represented little over ** percent. Meanwhile, Baidu was the most used search engine in China, despite a strong decrease in the percentage of internet users in the country accessing it. In other countries, like Japan and Mexico, people tend to use Yahoo along with Google. By the end of 2024, nearly half of the respondents in Japan said that they had used Yahoo in the past four weeks. In the same year, over ** percent of users in Mexico said they used Yahoo.
Facebook
TwitterA dataset of web design keywords, including their definitions, synonyms, antonyms, search volume and costs.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Guan Lin Tao
Released under Apache 2.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Google Trends of search terms is a useful proxy for the most popular topics at any given time. This dataset contains many of the top such worldwide searches in 2021.
Facebook
TwitterIn the second quarter of 2025, mobile devices (excluding tablets) accounted for 62.54 percent of global website traffic. Since consistently maintaining a share of around 50 percent beginning in 2017, mobile usage surpassed this threshold in 2020 and has demonstrated steady growth in its dominance of global web access. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Twitter [source]
This dataset contains sentiment analysis and information on the tweets of GoldGloveTV, one of the most popular gaming Twitch streamers. It includes data such as timestamp, content of the tweet, number of likes and replies, retweet count, URL associated with each tweet, conversation id associated to the given tweet and various other metadata. This dataset offers invaluable insights about the engagement and popularity surrounding GoldGloveTV's tweets. Furthermore, precise analytical operations concerning different aspects can be performed using this data in order to understand user behaviour better. This is a valuable resource for identifying successful strategies employed by GoldGloveTV in terms of marketing his brand or understanding how users engage with his content on this social media platform
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Exploring the descriptor data: The first step to analyzing this dataset is to explore the descriptive information such as the tweet timestamp, text, likes, reply count and retweet count among others. This will enable you to look at the trend of GoldGloveTV’s engagement and gain an idea of their most popular posts.
Analyze sentiment: Another useful way to use this dataset is to analyze sentiment by looking at each individual tweets' polarities (positive/negative) or subjectivity (objective/subjective). This could provide valuable insight on what topics people are generally interested in or enthusiastic about when discussing GoldGloveTV on Twitter.
Compare conversations: You can also compare conversations between different tweets with same conversation id if you want a bigger picture of how people are discussing about specific topics related to GoldGloveTV. Additionally, you can use the URL data in order check out any videos that were released alongside certain Tweets for more context (if needed).
Visualizing results: Finally, once you have gained all the necessary insights from analysing this data then it's important to visualize them using charts like scatter plots or bar graphs so that it's easier for anyone else looking into your analysis can understand your findings easily and quickly based on what they see in these visuals rather than having them guess through your raw numbers
Analyzing the trends of customer feedback over time to determine the sentiment associated with a particular brand or product. This can be used to help companies adjust their promotional strategies and improve their customer experience.
Use sentiment analysis on Twitter comments related to specific topics could be helpful for creating market research and gathering insights from user feedback.
Analyzing the sentiment around different hashtags in order to track conversations about current events, products, services, and brands in real-time and measure how people are responding to them
If you use this dataset in your research, please credit the original authors. Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Twitter.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe HIV epidemic in Hong Kong has worsened in recent years, with major contributions from high-risk subgroup of men who have sex with men (MSM). Internet use is prevalent among the majority of the local population, where they sought health information online. This study examines the impacts of HIV/AIDS and MSM news coverage on web search query in Hong Kong.MethodsRelevant news coverage about HIV/AIDS and MSM from January 1st, 2004 to December 31st, 2014 was obtained from the WiseNews databse. News trends were created by computing the number of relevant articles by type, topic, place of origin and sub-populations. We then obtained relevant search volumes from Google and analysed causality between news trends and Google Trends using Granger Causality test and orthogonal impulse function.ResultsWe found that editorial news has an impact on “HIV” Google searches on HIV, with the search term popularity peaking at an average of two weeks after the news are published. Similarly, editorial news has an impact on the frequency of “AIDS” searches two weeks after. MSM-related news trends have a more fluctuating impact on “MSM” Google searches, although the time lag varies anywhere from one week later to ten weeks later.ConclusionsThis infodemiological study shows that there is a positive impact of news trends on the online search behavior of HIV/AIDS or MSM-related issues for up to ten weeks after. Health promotional professionals could make use of this brief time window to tailor the timing of HIV awareness campaigns and public health interventions to maximise its reach and effectiveness.
Facebook
TwitterGoogle Chrome is a popular web browser developed by Google.
The Chrome User Experience Report is a public dataset of key user experience metrics for popular origins on the web, as experienced by Chrome users under real-world conditions.
https://bigquery.cloud.google.com/dataset/chrome-ux-report:all
For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/
License: CC BY 4.0
Photo by Edho Pratama on Unsplash
Facebook
TwitterThe problem of interest is the prediction of the apply rate. Imagine a user visiting a website, and performing a job search. From the set of displayed results, user clicks on certain ones that she is interested in, and after checking job descriptions, she further clicks on apply button therein to land in to an application page. The apply rate is defined as the fraction of applies (after visiting job description pages), and the goal is to predict this metric using the dataset described in the following section.
Each row in the dataset corresponds to a user’s view of a job listing. It has 10 columns as described below.
Please use the “search date pacific” column (9-th column) to split the dataset into training and test dataset. Train your model(s) using the data between 01/21/2018-01/26/2018, and test your model on 01/27/2018. Split the analysis into two parts:
Focus on the first 7 columns. Use these as features to predict the 8-th column, “apply”. Discuss the model you choose. Primarily focus on AUC as the metric of interest for your binary classifier. You can also investigate/discuss other metrics.
Consider now adding the last column to your feature set. Is it possible to segment data based on (“class id”), and achieve a better classification performance? How would you generate a better model using this feature?
Facebook
Twitterhttps://www.gnu.org/copyleft/gpl.htmlhttps://www.gnu.org/copyleft/gpl.html
Despite the fact that extensive list of open datasets are available in catalogues, most of the data publishers still connects their datasets to other popular datasets, such as DBpedia5, Freebase 6 and Geonames7. Although the linkage with popular datasets would allow us to explore external resources, it would fail to cover highly specialized information. Catalogues of linked data describe the content of datasets in terms of the update periodicity, authors, SPARQL endpoints, linksets with other datasets, amongst others, as recommended by W3C VoID Vocabulary. However, catalogues by themselves do not provide any explicit information to help the URI linkage process.Searching techniques can rank available datasets SI according to the probability that it will be possible to define links between URIs of SI and a given dataset T to be published, so that most of the links, if not all, could be found by inspecting the most relevant datasets in the ranking. dataset-search is a tool for searching datasets for linkage.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Web Accessibility Improvement: The "Web Page Object Detection" model can be used to identify and label various elements on a web page, making it easier for people with visual impairments to navigate and interact with websites using screen readers and other assistive technologies.
Web Design Analysis: The model can be employed to analyze the structure and layout of popular websites, helping web designers understand best practices and trends in web design. This information can inform the creation of new, user-friendly websites or redesigns of existing pages.
Automatic Web Page Summary Generation: By identifying and extracting key elements, such as titles, headings, content blocks, and lists, the model can assist in generating concise summaries of web pages, which can aid users in their search for relevant information.
Web Page Conversion and Optimization: The model can be used to detect redundant or unnecessary elements on a web page and suggest their removal or modification, leading to cleaner designs and faster-loading pages. This can improve user experience and, potentially, search engine rankings.
Assisting Web Developers in Debugging and Testing: By detecting web page elements, the model can help identify inconsistencies or errors in a site's code or design, such as missing or misaligned elements, allowing developers to quickly diagnose and address these issues.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Similarity indices between relative web search popularity and Covid-19 time-series.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data from web search engines have become a valuable adjunct in epidemiology and public health, specifically during epidemics. We aimed to explore the concordance of web search popularity for Covid-19 across 6 Western nations (United Kingdom, United States, France, Italy, Spain and Germany) and how timeline changes align with the pandemic waves, Covid-19 mortality, and incident case trajectories. We used the Google Trends tool for web-search popularity, and “Our World in Data” on Covid-19 reported cases, deaths, and administrative responses (measured by stringency index) to analyze country-level data. The Google Trends tool provides spatiotemporal data, scaled to a range of
Facebook
TwitterBaidu Search Index is a big data analytics tool developed by Baidu to track changes in keyword search popularity within its search engine. By analyzing trends in the Baidu Search Index for specific keywords, users can effectively monitor public interest in topics, companies, or brands.
As an ecosystem partner of Baidu Index, Datago has direct access to keyword search index data from Baidu's database, leveraging this information to build the BSIA-Consumer. This database encompasses popular brands that are actively searched by Chinese consumers, along with their commonly used names. By tracking Baidu Index search trends for these keywords, Datago precisely maps them to their corresponding publicly listed stocks.
The database covers over 1,100 consumer stocks and 3,000+ brand keywords across China, the United States, Europe, and Japan, with a particular focus on popular sectors like luxury goods and vehicles. Through its analysis of Chinese consumer search interest, this database offers investors a unique perspective on market sentiment, consumer preferences, and brand influence, including:
Brand Influence Tracking – By leveraging Baidu Search Index data, investors can assess the level of consumer interest in various brands, helping to evaluate their influence and trends within the Chinese market.
Consumer Stock Mapping – BSIA-consumer provides an accurate linkage between brand keywords and their associated consumer stocks, enabling investor analysis driven by consumer interest.
Coverage of Popular Consumer Goods – BSIA-consumer focuses specifically on trending sectors like luxury goods and vehicles, offering valuable insights into these industries.
Coverage: 1000+ consumer stocks
History: 2016-01-01
Update Frequency: Daily
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
When we think about the Internet, we quickly think of Internet search engines that enable efficient and precise use of resources located on www services. In the Western world, the search engine market has been dominated by "Google" for years, which does not mean that it has the entire market. In this database, we will look at the market shares of various search engines over the last 16 years.
The database saved in .csv form contains 28 columns. The first column contains the date (YYYY-MM) from the measurement period. Each subsequent column contains the percentage of search engine, given as a percentage, rounded to 2 decimal places (if the share is less than 0.005%, the value 0 remains, even though it may constitute a very small percentage of the share). We have a total of 191 rows, i.e. almost 16 years of data for each month since January 2009.
The database comes from the Statcounter and is made available in the operation with CC BY-SA 3.0 license which allows to copy, use and disseminate data also for commercial purposes after providing the source.
Photo by Duncan Meyer on Unsplash