88 datasets found

DataForSEO Google Full (Keywords+SERP) database, historical data available
datarade.ai
.json, .csv
Updated Aug 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataForSEO (2023). DataForSEO Google Full (Keywords+SERP) database, historical data available [Dataset]. https://datarade.ai/data-products/dataforseo-google-full-keywords-serp-database-historical-d-dataforseo
Explore at:
.json, .csvAvailable download formats
Dataset updated
Aug 17, 2023
Dataset provided by
Authors
DataForSEO
Area covered
Sweden, Burkina Faso, Costa Rica, United Kingdom, Côte d'Ivoire, Cyprus, South Africa, Paraguay, Portugal, Bolivia (Plurinational State of)
Description
You can check the fields description in the documentation: current Full database: https://docs.dataforseo.com/v3/databases/google/full/?bash; Historical Full database: https://docs.dataforseo.com/v3/databases/google/history/full/?bash.

Full Google Database is a combination of the Advanced Google SERP Database and Google Keyword Database.

Google SERP Database offers millions of SERPs collected in 67 regions with most of Google’s advanced SERP features, including featured snippets, knowledge graphs, people also ask sections, top stories, and more.

Google Keyword Database encompasses billions of search terms enriched with related Google Ads data: search volume trends, CPC, competition, and more.

This database is available in JSON format only.

You don’t have to download fresh data dumps in JSON – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.

SERP data from controversial queries on Google and Bing

zenodo.org

bin, csv, zip

Updated Apr 28, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Sal Hagen; Sal Hagen; Guillén Torres; Guillén Torres (2025). SERP data from controversial queries on Google and Bing [Dataset]. http://doi.org/10.5281/zenodo.14919504

Explore at:

zip, bin, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14919504

Dataset updated

Apr 28, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Sal Hagen; Sal Hagen; Guillén Torres; Guillén Torres

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Nov 24, 2024

Description

Data for the forthcoming publication 'Contested Components: Studying Interface Enrichment as a Form of Content Moderation on Google and Bing'.

Datasets contain information on SERP components for Google Bing when querying 2000 controversial and 914 non-controversial questions.

Files include:

question_data.csv: Information on questions sourced from 4chan and leftychan boards in November 2024. Columns include the counts per board (/fit/, /b/, /pol/. /int/, /k/, /lgbt/, and /leftypol/), categorization as controversial/non-controverial, and toxicity scores determined by Perspective API.
serp_components.csv: Information on the SERP data gathered using Zoekplaatje. Collected on 24 November 2024.
screenshots.zip: Screenshots of all SERPs. Note that at times, expanding the AI Overview box on Google resulted in the search bar overlaying the generated text.
component_analysis.ipynb: Code for analyzing the data.

Component taxomony and screenshots

<td

Search engine	Component name	Count	Example
Google	organic	19,470	Click to view
Bing	organic	18,677	Click to view
Google	related-questions	1,534	Click to view
Bing	related-queries	1,425	Click to view
Bing	info-card	1,320	Click to view (each card is its own info-card component)
Google	related-queries	1,289	Click to view
Bing	organic-answer	1,140	Click to view (often summarised through AI-assisted means)
Bing	video-widget	776	Click to view
Bing	organic-showcase	752	Click to view
Bing	related-questions	725	Click to view
Google	ai-overview	499	Click to view
Bing	organic-wiki-widget	271	Click to view
Google	did-you-mean	223	Click to view
Bing	related-queries-carousel	219	Click to view
Bing	info-card-image	136	Click to view

DataForSEO Google Keyword Database, historical and current
datarade.ai
.json, .csv
Updated Mar 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataForSEO (2023). DataForSEO Google Keyword Database, historical and current [Dataset]. https://datarade.ai/data-products/dataforseo-google-keyword-database-historical-and-current-dataforseo
Explore at:
.json, .csvAvailable download formats
Dataset updated
Mar 14, 2023
Dataset provided by
Authors
DataForSEO
Area covered
Cyprus, Canada, Bolivia (Plurinational State of), Singapore, Bahrain, Uruguay, Spain, Bangladesh, El Salvador, Turkey
Description
You can check the fields description in the documentation: current Keyword database: https://docs.dataforseo.com/v3/databases/google/keywords/?bash; Historical Keyword database: https://docs.dataforseo.com/v3/databases/google/history/keywords/?bash. You don’t have to download fresh data dumps in JSON or CSV – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.
Data for study "Direct Answers in Google Search Results"
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jun 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artur Strzelecki; Artur Strzelecki; Paulina Rutecka; Paulina Rutecka (2020). Data for study "Direct Answers in Google Search Results" [Dataset]. http://doi.org/10.5281/zenodo.3541092
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3541092
Dataset updated
Jun 9, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Artur Strzelecki; Artur Strzelecki; Paulina Rutecka; Paulina Rutecka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The goal of this research is to examine direct answers in Google web search engine. Dataset was collected using Senuto (https://www.senuto.com/). Senuto is as an online tool, that extracts data on websites visibility from Google search engine.

Dataset contains the following elements:

keyword,

number of monthly searches,

featured domain,

featured main domain,

featured position,

featured type,

featured url,

content,

content length.

Dataset with visibility structure has 743 798 keywords that were resulting in SERPs with direct answer.
TREC 2022 Deep Learning test collection
catalog.data.gov
data.nist.gov
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). TREC 2022 Deep Learning test collection [Dataset]. https://catalog.data.gov/dataset/trec-2022-deep-learning-test-collection
Explore at:
Dataset updated
May 9, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.
h
google_search_terms_training_data
huggingface.co
Updated Jul 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoshang Chenoy (2024). google_search_terms_training_data [Dataset]. https://huggingface.co/datasets/hoshangc/google_search_terms_training_data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2024
Authors
Hoshang Chenoy
Description
Dataset Card for Dataset Name

This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

Dataset Details Dataset Description

Dataset Name: Google Search Trends Top Rising Search Terms Description: The Google Search Trends Top Rising Search Terms dataset provides valuable insights into the most rapidly growing search queries on the Google search engine. It offers a comprehensive collection of trending search… See the full description on the dataset page: https://huggingface.co/datasets/hoshangc/google_search_terms_training_data.
i
Global Social Search Engine Market Growth (Status and Outlook) 2025-2031...
infinitymarketresearch.com
html
Updated Sep 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Infinity Market Research (2025). Global Social Search Engine Market Growth (Status and Outlook) 2025-2031 Dataset [Dataset]. https://infinitymarketresearch.com/report/social-search-engine-market/7440
Explore at:
htmlAvailable download formats
Dataset updated
Sep 26, 2025
Dataset authored and provided by
Infinity Market Research
License
https://infinitymarketresearch.com/termsandconditionshttps://infinitymarketresearch.com/termsandconditions
Description
Global Social Search Engine Market growth is projected to reach USD $ Billion in 2025, at a $% CAGR by driving industry size, share, segments research, top company analysis, trends and forecast report 2025 to 2031.
G
Next Generation Search Engines Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Next Generation Search Engines Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/next-generation-search-engines-market-global-industry-analysis
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Aug 4, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Next Generation Search Engines Market Outlook

According to our latest research, the global Next Generation Search Engines market size reached USD 16.2 billion in 2024, with a robust year-on-year growth driven by rapid technological advancements and escalating demand for intelligent search solutions across industries. The market is expected to witness a CAGR of 18.7% during the forecast period from 2025 to 2033, propelling the market to a projected value of USD 82.3 billion by 2033. The accelerating adoption of artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) within search technologies is a key growth factor, as organizations seek more accurate, context-aware, and personalized information retrieval solutions.

One of the most significant growth drivers for the Next Generation Search Engines market is the exponential increase in digital content and data generation worldwide. Enterprises and consumers alike are producing vast amounts of unstructured data daily, from documents and emails to social media posts and multimedia files. Traditional search engines often struggle to deliver relevant results from such complex datasets. Next generation search engines, powered by AI and ML algorithms, are uniquely positioned to address this challenge by providing semantic understanding, contextual relevance, and intent-driven results. This capability is especially critical for industries like healthcare, BFSI, and e-commerce, where timely and precise information retrieval can directly impact decision-making, operational efficiency, and customer satisfaction.

Another major factor fueling the growth of the Next Generation Search Engines market is the proliferation of mobile devices and the evolution of user interaction paradigms. As consumers increasingly rely on smartphones, tablets, and voice assistants, there is a growing demand for search solutions that support voice and visual queries, in addition to traditional text-based searches. Technologies such as voice search and visual search are gaining traction, enabling users to interact with search engines more naturally and intuitively. This shift is prompting enterprises to invest in advanced search platforms that can seamlessly integrate with diverse devices and channels, enhancing user engagement and accessibility. The integration of NLP further empowers these platforms to understand complex queries, colloquial language, and regional dialects, making search experiences more inclusive and effective.

Furthermore, the rise of enterprise digital transformation initiatives is accelerating the adoption of next generation search technologies across various sectors. Organizations are increasingly seeking to unlock the value of their internal data assets by deploying enterprise search solutions that can index, analyze, and retrieve information from multiple sources, including databases, intranets, cloud storage, and third-party applications. These advanced search engines not only improve knowledge management and collaboration but also support compliance, security, and data governance requirements. As businesses continue to embrace hybrid and remote work models, the need for efficient, secure, and scalable search capabilities becomes even more pronounced, driving sustained investment in this market.

Regionally, North America currently dominates the Next Generation Search Engines market, owing to the early adoption of AI-driven technologies, strong presence of leading technology vendors, and high digital literacy rates. However, Asia Pacific is emerging as the fastest-growing region, fueled by rapid digitalization, expanding internet penetration, and increasing investments in AI research and development. Europe is also witnessing steady growth, supported by robust regulatory frameworks and growing demand for advanced search solutions in sectors such as BFSI, healthcare, and education. Latin America and the Middle East & Africa are gradually catching up, as enterprises in these regions recognize the value of next generation search engines in enhancing operational efficiency and customer experience.
T
ag_news_subset
tensorflow.org
Updated Dec 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). ag_news_subset [Dataset]. http://identifiers.org/arxiv:1509.01626
Explore at:
Unique identifier
https://identifiers.org/arxiv:1509.01626
Dataset updated
Dec 6, 2022
Description
AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .

The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('ag_news_subset', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
e
Traffic. Location of traffic measuring points
data.europa.eu
unknown
Updated Jul 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayuntamiento de Madrid (2025). Traffic. Location of traffic measuring points [Dataset]. https://data.europa.eu/data/datasets/https-datos-madrid-es-egob-catalogo-202468-0-intensidad-trafico
Explore at:
unknown(432128), unknown(438272), unknown(568320), unknown(1037312), unknown(440320), unknown(864256), unknown(752640), unknown(858112), unknown(697344), unknown(854016), unknown(576512), unknown(1555456), unknown(683008), unknown(435200), unknown(618496), unknown(881664), unknown(657408), unknown(633856), unknown(680960), unknown(780288), unknown(838656), unknown(1730560), unknown(806912), unknown(1355776), unknown(1569792), unknown(1362944), unknown(1628160), unknown(852992), unknown(638976), unknown(653312), unknown(1364992), unknown(1592320), unknown(875520), unknown(1567744), unknown(1376256), unknown(506880), unknown(647168), unknown(685056), unknown(1632256), unknown(582656), unknown(803840), unknown(1590272), unknown(696320), unknown(1084416), unknown(1571840), unknown(607232), unknown(904192), unknown(628736), unknown(785408), unknown(445440), unknown(509952), unknown(826368), unknown(886784), unknown(441344), unknown(795648), unknown(1605632), unknown(874496), unknown(862208), unknown(1630208), unknown(679936), unknown(1587200), unknown(646144), unknown(812032), unknown(1608704), unknown(605184), unknown(545792), unknown(840704), unknown(1383424), unknown(1576960), unknown(592896), unknown(431104), unknown(463872), unknown(429056), unknown(896000), unknown(620544), unknown(1550336), unknown(791552), unknown(1629184), unknown(901120), unknown(731136), unknown(762880), unknown(746496), unknown(1385472), unknown(544768), unknown(626688), unknown(492544), unknown(845824), unknown(790528), unknown(622592), unknown(488448), unknown(603136), unknown(627712), unknown(873472), unknown(577536), unknown(621568), unknown(721920), unknown(564224), unknown(1366016), unknown(1382400), unknown(839680), unknown(668672), unknown(1369088), unknown(684032), unknown(572416), unknown(1616896), unknown(1388544), unknown(900096), unknown(540672), unknown(1595392), unknown(637952), unknown(575488), unknown(759808), unknown(1086464), unknown(848896), unknown(1372160), unknown(891904), unknown(1371136), unknown(644096), unknown(741376), unknown(1053696), unknown(865280), unknown(590848), unknown(1149952), unknown(1033216), unknown(863232), unknown(856064), unknown(591872), unknown(763904), unknown(632832), unknown(1557504), unknown(1600512), unknown(1035264), unknown(1609728), unknown(1921024), unknown(850944), unknown(735232), unknown(745472), unknown(529408), unknown(669696), unknown(434176), unknown(1139712), unknown(1095680), unknown(1043456), unknown(640000), unknown(846848), unknown(1358848), unknown(650240), unknown(2145280), unknown(822272), unknown(1566720), unknown(902144), unknown(585728), unknown(784384), unknown(748544), unknown(693248), unknown(474112), unknown(1561600), unknown(665600), unknown(888832), unknown(857088), unknown(518144), unknown(911360), unknown(842752), unknown(860160), unknown(1559552), unknown(692224), unknown(815104), unknown(543744), unknown(444416), unknown(599040), unknown(743424), unknown(751616), unknown(739328), unknown(565248), unknown(583680), unknown(1370112), unknown(600064), unknown(808960), unknown(818176), unknown(641024), unknown(596992), unknown(503808), unknown(859136), unknown(698368), unknown(552960), unknown(871424), unknown(550912), unknown(703488), unknown(548864), unknown(868352), unknown(561152), unknown(574464), unknown(915456), unknown(505856), unknown(701440), unknown(849920), unknown(538624)Available download formats
Dataset updated
Jul 14, 2025
Dataset authored and provided by
Ayuntamiento de Madrid
License
https://datos.madrid.es/egob/catalogo/aviso-legalhttps://datos.madrid.es/egob/catalogo/aviso-legal
Description
This data set is related to Traffic. History of traffic data since 2013, indicating the latter for each measurement point, the passing vehicles. The infrastructure of measurement points, available in the city of Madrid corresponds to: 7,360 vehicle detectors with the following characteristics: 71 include number plate reading devices 158 have optical machine vision systems with control from the Mobility Management Center 1,245 are specific to fast roads and access to the city and the rest of the 5,886, with basic traffic light control systems. More than 4,000 measuring points : 253 with systems for speed control, characterization of vehicles and double reading loop 70 of them make up the stations of taking specific seats of the city. Automatic control systems of all the information obtained from the detectors with continuous contrast with expected behavior patterns, as well as the follow-up of the instructions marked by the Technical Committee for Standardization AEN/CTN 199; and in particular SC3 specific applications relating to “Detectors and data collection stations” and SC15 relating to “Data quality”. In this same portal you can find other related data sets such as: Traffic. Real-time traffic data . With real-time information (updated every 5 minutes) Traffic. Map of traffic intensity plots, with the same information in KML format, and with the possibility of viewing it in Google Maps or Google Earth. And other traffic-related data sets. You can search for them by putting the word 'Traffic' in the search engine (top right).
B2B Technographic Data in the US Techsalerator
kaggle.com
Updated Sep 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Techsalerator (2024). B2B Technographic Data in the US Techsalerator [Dataset]. https://www.kaggle.com/datasets/techsalerator/technographic-data-in-the-united-states
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 8, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Techsalerator
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Techsalerator’s Business Technographic Data for United States provides a thorough and insightful collection of information essential for businesses, market analysts, and technology vendors. This dataset offers a deep dive into the technological landscape of companies operating in United States, capturing and categorizing data related to their technology stacks, digital tools, and IT infrastructure.

Please reach out to us at info@techsalerator.com or https://www.techsalerator.com/contact-us

Top 5 Most Utilized Data Fields Company Name: This field lists the name of the company being analyzed. Understanding the companies helps technology vendors target their solutions and enables market analysts to evaluate technology adoption trends within specific businesses. Technology Stack: This field details the technologies and software solutions a company utilizes, such as CRM systems, ERP software, and cloud services. Knowledge of a company’s technology stack is vital for understanding its operational capabilities and technology needs. Deployment Status: This field indicates whether the technology is currently in use, planned for deployment, or under evaluation. This status helps vendors gauge the level of interest and current adoption among businesses. Industry Sector: This field identifies the industry sector in which the company operates, such as finance, manufacturing, or retail. Segmenting by industry sector helps vendors tailor their offerings to specific market needs and trends. Geographic Location: This field provides the geographic location of the company's headquarters or primary operations within United States. This information is useful for regional market analysis and understanding local technology adoption patterns. Top 5 Technology Trends in the United States Artificial Intelligence and Machine Learning: AI and ML continue to drive innovation across various sectors, from autonomous vehicles and healthcare to finance and customer service. Key advancements include natural language processing, computer vision, and reinforcement learning. Cloud Computing and Edge Computing: The shift towards cloud computing remains strong, with major providers like AWS, Azure, and Google Cloud leading the way. Edge computing is also gaining traction, enabling faster processing and data analysis closer to the source, which is crucial for IoT applications. 5G Technology: The rollout of 5G networks is transforming connectivity, enabling faster data speeds, lower latency, and new applications in IoT, smart cities, and augmented reality (AR). Major telecom companies and technology providers are heavily invested in this technology. Cybersecurity and Privacy: As digital threats become more sophisticated, there is an increased focus on cybersecurity solutions, including threat detection, data encryption, and privacy protection. Innovations in this space aim to combat ransomware, data breaches, and other cyber risks. Blockchain and Decentralized Finance (DeFi): Blockchain technology is expanding beyond cryptocurrencies, with applications in supply chain management, digital identity, and smart contracts. DeFi is a growing sector within blockchain, offering decentralized financial services and products. Top 5 Companies with Notable Technographic Data in the United States Microsoft: A leading technology company known for its software, cloud computing services (Azure), and AI research. Microsoft's diverse portfolio includes operating systems, enterprise solutions, and gaming (Xbox). Google (Alphabet Inc.): A major player in search engines, cloud computing, AI, and consumer electronics. Google is at the forefront of innovations in machine learning, autonomous driving (Waymo), and digital advertising. Amazon: Known for its e-commerce platform, Amazon is also a significant force in cloud computing (AWS), AI, and logistics. AWS is a leading cloud service provider, and Amazon's technology initiatives span various industries. Apple Inc.: Renowned for its consumer electronics, including iPhones, iPads, and Macs. Apple is also investing in emerging technologies such as AR, wearable technology (Apple Watch), and health tech. IBM: A historic leader in technology and consulting services, IBM focuses on enterprise solutions, cloud computing, AI (IBM Watson), and quantum computing. The company is known for its research and development in cutting-edge technologies. Accessing Techsalerator’s Business Technographic Data If you’re interested in obtaining Techsalerator’s Business Technographic Data for United States, please contact info@techsalerator.com with your specific requirements. Techsalerator will provide a customized quote based on the number of data fields and records you need, with the dataset available for delivery within 24 hours. Ongoing access options can also be discussed as needed.

Included Data Fields Company Name Technology Stack Depl...
Dataset Search WebApp
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Batista Neves Júnior; Luiz André Portes Paes Leme (2023). Dataset Search WebApp [Dataset]. http://doi.org/10.6084/m9.figshare.5217958.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5217958.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Angelo Batista Neves Júnior; Luiz André Portes Paes Leme
License
https://www.gnu.org/copyleft/gpl.htmlhttps://www.gnu.org/copyleft/gpl.html
Description
Despite the fact that extensive list of open datasets are available in catalogues, most of the data publishers still connects their datasets to other popular datasets, such as DBpedia5, Freebase 6 and Geonames7. Although the linkage with popular datasets would allow us to explore external resources, it would fail to cover highly specialized information. Catalogues of linked data describe the content of datasets in terms of the update periodicity, authors, SPARQL endpoints, linksets with other datasets, amongst others, as recommended by W3C VoID Vocabulary. However, catalogues by themselves do not provide any explicit information to help the URI linkage process.Searching techniques can rank available datasets SI according to the probability that it will be possible to define links between URIs of SI and a given dataset T to be published, so that most of the links, if not all, could be found by inspecting the most relevant datasets in the ranking. dataset-search is a tool for searching datasets for linkage.
i
Interface Element Frequencies in Search Engine Results Pages (SERPs) Across...
rdm.inesctec.pt
Updated Jul 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Interface Element Frequencies in Search Engine Results Pages (SERPs) Across Query Intents, Search Engines and Languages [Dataset]. https://rdm.inesctec.pt/dataset/cs-2025-006
Explore at:
Dataset updated
Jul 22, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains the data produced for the dissertation ""User Interface Variations in Search Engine Results Pages Across Types of Search Queries and Search Engines"". The project was conducted by student Adelaide Miranda Santos at FEUP, University of Porto, as part of the Masters in Informatics and Computing Engineering. The primary objective of this work is to study interface variations in search engine results pages (SERPs) across different search engines and types of search queries. To this end, nearly 8,000 SERPs were captured using the ORCAS-I-gold dataset across six leading web search engines: Google, Microsoft Bing, Yandex, Yahoo!, Baidu, and DuckDuckGo. For each captured SERP, the number of occurrences of each interface element was recorded. Additionally, to analyze how the language of a search query affects SERP composition in Yandex and Baidu, the original English queries were translated into Russian and Simplified Chinese." The dataset is organized in the following folders: Search Query Dataset Translation Contains the search queries from the ORCAS-I-gold dataset translated into Russian and Simplified Chinese. The translation was made using ChatGPT-4o and verified by native speakers. In addition to the translated queries, the complete original ORCAS-I-gold dataset is also included as an independent resource. SERP Captures Includes HTML files of the search engine results pages collected from Baidu, Microsoft Bing, DuckDuckGo, Google, Yahoo!, and Yandex. Each top-level subfolder is named after the respective search engine. Within each of these, there are folders named according to the language and the query intent associated with the search query. These folders contain the corresponding SERP HTML files. File names represent the search queries and may be either encoded or displayed as in the original dataset. Occurrence of Elements per SERP For each captured SERP, we recorded the frequency of each interface element. This data is organized in a relational database structure composed of the following CSV files: - elements.csv: Lists all identified SERP elements along with their corresponding IDs, categories, types, and subtypes (if applicable). - identifiers.csv: Contains the selectors or identifiers used for automatic detection of each element, along with their associated element ID, identifier ID, and the corresponding search engine ID. - intents.csv: Maps query intent names to their corresponding intent IDs. - search-engines.csv: Maps search engine names to their corresponding IDs. - main.csv: Records the frequency of each element in each captured SERP. Each row represents an observation and includes the following fields: element ID, identifier ID, search engine ID, query language, intent ID, query ID (as defined in the ORCAS-I-gold dataset), and the number of occurrences.
Data from: Inventory of online public databases and repositories holding...
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
g
The major statistical data of natural referencing | gimi9.com
gimi9.com
Updated Nov 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). The major statistical data of natural referencing | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_65f594ba5cf5f141524928b6/
Explore at:
Dataset updated
Nov 30, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset gathers the most crucial SEO statistics for the year, providing an overview of the dominant trends and best practices in the field of search engine optimization. Aimed at digital marketing professionals, site owners, and SEO analysts, this collection of information serves as a guide to navigate the evolving SEO landscape with confidence and accuracy. Mode of Data Production: The statistics have been carefully selected and compiled from a variety of credible and recognized sources in the SEO industry, including research reports, web traffic data analytics, and consumer and marketing professional surveys. Each statistic was checked for reliability and relevance to current trends. Categories Included: User search behaviour: Statistics on the evolution of search modes, including voice and mobile search. Mobile Optimisation: Data on the importance of site optimization for mobile devices. Importance of Backlinks: Insights on the role of backlinks in SEO ranking and the need to prioritize quality. Content quality: Statistics highlighting the importance of relevant and engaging content for SEO. Search engine algorithms: Information on the impact of algorithm updates on SEO strategies. Usefulness of the Data: This dataset is designed to help users quickly understand current SEO dynamics and apply that knowledge in optimizing their digital marketing strategies. It provides a solid foundation for benchmarking, strategic planning, and informed decision-making in the field of SEO. Update and Accessibility: To ensure relevance and timeliness, the dataset will be regularly updated with new information and emerging trends in the SEO world.
Ads from context advertising
kaggle.com
zip
Updated Feb 17, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kotobotov (2017). Ads from context advertising [Dataset]. https://www.kaggle.com/kotobotov/context-advertising
Explore at:
zip(9888139 bytes)Available download formats
Dataset updated
Feb 17, 2017
Authors
Kotobotov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

In order to create assistant for the content-advertising system, there was an automated generator advert content. So a lot of ads were collected from a popular search engine (only for Russian ads campaign). (if someone interesting in, i can upload full 40+GB data)

Content

The database was collected from open public sources and contains ads from regions of Russia, Ukraine, Belarus, Kazakhstan and the major cities of these countries.

Unique items: 800 000 (part1) Total size about 15MM

Acknowledgements

The database was collected in October 2016 - January 2017. No one was harmed when collecting the database (the program does not click on the ads).

Inspiration

Try to search patterns in the ads, and develop an automatic text generator for ad systems.
MOESM2 of ExCAPE-DB: an integrated large scale dataset facilitating Big Data...
springernature.figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiangming Sun; Nina Jeliazkova; Vladimir Chupakin; Jose-Felipe Golib-Dzib; Ola Engkvist; Lars Carlsson; Jรถrg Wegner; Hugo Ceulemans; Ivan Georgiev; Vedrin Jeliazkov; Nikolay Kochev; Thomas Ashby; Hongming Chen (2023). MOESM2 of ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics [Dataset]. http://doi.org/10.6084/m9.figshare.c.3711712_D2.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3711712_D2.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jiangming Sun; Nina Jeliazkova; Vladimir Chupakin; Jose-Felipe Golib-Dzib; Ola Engkvist; Lars Carlsson; Jรถrg Wegner; Hugo Ceulemans; Ivan Georgiev; Vedrin Jeliazkov; Nikolay Kochev; Thomas Ashby; Hongming Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2: Table S1. The list of selected activity types in the PubChem.
D
Search Engineing Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Search Engineing Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/search-engine-marketing-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Search Engine Market Outlook

The search engine market size was valued at approximately USD 124 billion in 2023 and is projected to reach USD 258 billion by 2032, witnessing a robust CAGR of 8.5% during the forecast period. This growth is largely attributed to the increasing reliance on digital platforms and the internet across various sectors, which has necessitated the use of search engines for data retrieval and information dissemination. With the proliferation of smartphones and the expansion of internet access globally, search engines have become indispensable tools for both businesses and consumers, driving the market's upward trajectory. The integration of artificial intelligence and machine learning technologies into search engines is transforming the way search engines operate, offering more personalized and efficient search results, thereby further propelling market growth.

One of the primary growth factors in the search engine market is the ever-increasing digitalization across industries. As businesses continue to transition from traditional modes of operation to digital platforms, the need for search engines to navigate and manage data becomes paramount. This shift is particularly evident in industries such as retail, BFSI, and healthcare, where vast amounts of data are generated and require efficient management and retrieval systems. The integration of AI and machine learning into search engine algorithms has enhanced their ability to process and interpret large datasets, thereby improving the accuracy and relevance of search results. This technological advancement not only improves user experience but also enhances the competitive edge of businesses, further fueling market growth.

Another significant growth factor is the expanding e-commerce sector, which relies heavily on search engines to connect consumers with products and services. With the rise of e-commerce giants and online marketplaces, consumers are increasingly using search engines to find the best prices, reviews, and availability of products, leading to a surge in search engine usage. Additionally, the implementation of voice search technology and the growing popularity of smart home devices have introduced new dynamics to search engine functionality. Consumers are now able to conduct searches verbally, which has necessitated the adaptation of search engines to incorporate natural language processing capabilities, further driving market growth.

The advertising and marketing sectors are also contributing significantly to the growth of the search engine market. Businesses are leveraging search engines as a primary tool for online advertising, given their wide reach and ability to target specific audiences. Pay-per-click advertising and search engine optimization strategies have become integral components of digital marketing campaigns, enabling businesses to enhance their visibility and engagement with potential customers. The measurable nature of these advertising techniques allows businesses to assess the effectiveness of their campaigns and make data-driven decisions, thereby increasing their reliance on search engines and contributing to overall market growth.

The evolution of search engines is closely tied to the development of Ai Enterprise Search, which is revolutionizing how businesses access and utilize information. Ai Enterprise Search leverages artificial intelligence to provide more accurate and contextually relevant search results, making it an invaluable tool for organizations that manage large volumes of data. By understanding user intent and learning from past interactions, Ai Enterprise Search systems can deliver personalized experiences that enhance productivity and decision-making. This capability is particularly beneficial in sectors such as finance and healthcare, where quick access to precise information is crucial. As businesses continue to digitize and data volumes grow, the demand for Ai Enterprise Search solutions is expected to increase, further driving the growth of the search engine market.

Regionally, North America holds a significant share of the search engine market, driven by the presence of major technology companies and a well-established digital infrastructure. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. This growth can be attributed to the rapid digital transformation in emerging economies such as China and India, where increasing internet penetration and smartphone adoption are driving demand for search engines. Additionally, government initiatives to
e
Search Engine Optimisation (SEO) Strategy as Determinants to Enhance the...
b2find.eudat.eu
Updated Jun 30, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Search Engine Optimisation (SEO) Strategy as Determinants to Enhance the Online Brand Positioning - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/7bd815f4-4e9c-568b-80d1-6f1dddf2ba67
Explore at:
Dataset updated
Jun 30, 2021
Description
DOI The main purpose of this research to identify the persistency of using SEO strategy inclusive using of the niche point of differentiation, valuable content, targeted keyword and scalable link building, as the determinants that enhance the success of online brand positioning. Date Submitted: 2021-06-30
i
ExCAPE-DB
solr.ideaconsult.net
csv
Updated Nov 29, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
H2020 ExCAPE (2016). ExCAPE-DB [Dataset]. https://solr.ideaconsult.net/search/excape/
Explore at:
csvAvailable download formats
Dataset updated
Nov 29, 2016
Dataset authored and provided by
H2020 ExCAPE
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
ExcapeDB: An integrated large scale dataset facilitating Big Data analysis in chemogenomics

Facebook

Twitter

Click to copy link

Link copied

Cite

DataForSEO (2023). DataForSEO Google Full (Keywords+SERP) database, historical data available [Dataset]. https://datarade.ai/data-products/dataforseo-google-full-keywords-serp-database-historical-d-dataforseo

DataForSEO Google Full (Keywords+SERP) database, historical data available

Explore at:

.json, .csvAvailable download formats

Dataset updated

Aug 17, 2023

Dataset provided by

Authors

DataForSEO

Area covered

Sweden, Burkina Faso, Costa Rica, United Kingdom, Côte d'Ivoire, Cyprus, South Africa, Paraguay, Portugal, Bolivia (Plurinational State of)

Description

You can check the fields description in the documentation: current Full database: https://docs.dataforseo.com/v3/databases/google/full/?bash; Historical Full database: https://docs.dataforseo.com/v3/databases/google/history/full/?bash.

Full Google Database is a combination of the Advanced Google SERP Database and Google Keyword Database.

Google SERP Database offers millions of SERPs collected in 67 regions with most of Google’s advanced SERP features, including featured snippets, knowledge graphs, people also ask sections, top stories, and more.

Google Keyword Database encompasses billions of search terms enriched with related Google Ads data: search volume trends, CPC, competition, and more.

This database is available in JSON format only.

You don’t have to download fresh data dumps in JSON – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.

Clear search

Close search

Google apps

Main menu

DataForSEO Google Full (Keywords+SERP) database, historical data available

SERP data from controversial queries on Google and Bing

Component taxomony and screenshots

DataForSEO Google Keyword Database, historical and current

Data for study "Direct Answers in Google Search Results"

TREC 2022 Deep Learning test collection

google_search_terms_training_data

Global Social Search Engine Market Growth (Status and Outlook) 2025-2031...

Next Generation Search Engines Market Research Report 2033

Next Generation Search Engines Market Outlook

ag_news_subset

Traffic. Location of traffic measuring points

B2B Technographic Data in the US Techsalerator

Dataset Search WebApp

Interface Element Frequencies in Search Engine Results Pages (SERPs) Across...

Data from: Inventory of online public databases and repositories holding...

The major statistical data of natural referencing | gimi9.com

Ads from context advertising

Context

Content

Acknowledgements

Inspiration

MOESM2 of ExCAPE-DB: an integrated large scale dataset facilitating Big Data...

Search Engineing Market Report | Global Forecast From 2025 To 2033

Search Engine Market Outlook

Search Engine Optimisation (SEO) Strategy as Determinants to Enhance the...

ExCAPE-DB

DataForSEO Google Full (Keywords+SERP) database, historical data available