32 datasets found

C
AI-Powered Search Features: Preparing for Google SGE and Bing Chat...
caseysseo.com
txt
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casey Miller (2025). AI-Powered Search Features: Preparing for Google SGE and Bing Chat Integration [Dataset]. https://caseysseo.com/ai-powered-search-features-preparing-for-google-sge-and-bing-chat-integration
Explore at:
txtAvailable download formats
Dataset updated
Aug 21, 2025
Dataset provided by
Casey's SEO
Authors
Casey Miller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Variables measured
Colorado Springs Mobile Search Growth, Increase in Quality Leads from Bing Chat, Increase in Leads for Colorado Springs Contractor, Increase in Clicks for Local Businesses in Google SGE, Average Number of Source Citations per Google SGE Answer, Percentage of Local Service Queries with AI-Powered Features
Measurement technique
Qualitative and quantitative data from search engine performance monitoring, First-hand observations from local business optimization campaigns, Industry research and analysis
Description
This dataset provides detailed information about the rise of AI-powered search features, such as Google's Search Generative Experience (SGE) and Bing's Chat integration, and how local businesses can optimize their online presence to capitalize on these new search trends. The dataset covers the current state of AI search features, the unique opportunities and challenges for local businesses, and actionable strategies for improving visibility in this evolving search landscape.
Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...
zenodo.org
data.niaid.nih.gov
csv
Updated Mar 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabian Haak; Fabian Haak; Philipp Schaer; Philipp Schaer (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. http://doi.org/10.5281/zenodo.7682915
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7682915
Dataset updated
Mar 1, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Fabian Haak; Fabian Haak; Philipp Schaer; Philipp Schaer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles.
Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

Dataset 2: Search Query Suggestions (suggestions.csv)

The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

AllSides Scraper

At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.
h
comp-serp-data
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Goker Cebeci, comp-serp-data [Dataset]. https://huggingface.co/datasets/goker/comp-serp-data
Explore at:
Authors
Goker Cebeci
Description
Comprehensive SERP Data

This dataset contains comprehensive search engine ranking data collected from Google and Bing, along with extracted technical and content features for analyzing search engine ranking algorithms.

📊 Dataset Overview

Total Records: 14,465 search results Search Engines: Google (5,895 results) and Bing (8,570 results) Keywords: 500 diverse search queries Features: 20 features including technical scores, content analysis, and ranking metadata… See the full description on the dataset page: https://huggingface.co/datasets/goker/comp-serp-data.
C
Schema Markup Implementation: Structured Data Strategies for Google and Bing...
caseysseo.com
txt
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casey Miller (2025). Schema Markup Implementation: Structured Data Strategies for Google and Bing Visibility [Dataset]. https://caseysseo.com/schema-markup-implementation-structured-data-strategies-for-google-and-bing-visibility
Explore at:
txtAvailable download formats
Dataset updated
Aug 21, 2025
Dataset provided by
Casey's SEO
Authors
Casey Miller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Area covered
Colorado Springs
Variables measured
Content Word Count, Click-through Rate Improvement, Trust Factor for Online Reviews, Military Population in Colorado Springs, Mobile Search Growth in Colorado Springs, Percentage of Websites Using Structured Data
Measurement technique
Reviewed industry reports and guidelines from leading search engines, Conducted customer surveys to gather feedback on purchasing decisions, Analyzed historical website performance data and search metrics
Description
This dataset provides comprehensive information on the importance of schema markup implementation for improving search visibility on Google and Bing. It covers the benefits of schema markup, the most impactful schema types for businesses, and effective implementation strategies. The dataset includes details on the creator, publisher, content coverage, data sources, and quantitative metrics related to the schema markup impact.
h
hyperlinks
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Goker Cebeci, hyperlinks [Dataset]. https://huggingface.co/datasets/goker/hyperlinks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Goker Cebeci
Description
Hyperlinks Dataset

This dataset contains subpage links, their features, and corresponding search engine rankings from Google and Bing. The data was collected as part of the research project: "Accessible Hyperlinks and Search Engine Rankings: An Empirical Investigation".

Dataset Description

This dataset is designed to facilitate research on the relationship between website accessibility, specifically hyperlink accessibility, and search engine rankings. It consists of… See the full description on the dataset page: https://huggingface.co/datasets/goker/hyperlinks.
Data from: Inventory of online public databases and repositories holding...
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
i
Interface Element Frequencies in Search Engine Results Pages (SERPs) Across...
rdm.inesctec.pt
Updated Jul 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Interface Element Frequencies in Search Engine Results Pages (SERPs) Across Query Intents, Search Engines and Languages [Dataset]. https://rdm.inesctec.pt/dataset/cs-2025-006
Explore at:
Dataset updated
Jul 22, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains the data produced for the dissertation ""User Interface Variations in Search Engine Results Pages Across Types of Search Queries and Search Engines"". The project was conducted by student Adelaide Miranda Santos at FEUP, University of Porto, as part of the Masters in Informatics and Computing Engineering. The primary objective of this work is to study interface variations in search engine results pages (SERPs) across different search engines and types of search queries. To this end, nearly 8,000 SERPs were captured using the ORCAS-I-gold dataset across six leading web search engines: Google, Microsoft Bing, Yandex, Yahoo!, Baidu, and DuckDuckGo. For each captured SERP, the number of occurrences of each interface element was recorded. Additionally, to analyze how the language of a search query affects SERP composition in Yandex and Baidu, the original English queries were translated into Russian and Simplified Chinese." The dataset is organized in the following folders: Search Query Dataset Translation Contains the search queries from the ORCAS-I-gold dataset translated into Russian and Simplified Chinese. The translation was made using ChatGPT-4o and verified by native speakers. In addition to the translated queries, the complete original ORCAS-I-gold dataset is also included as an independent resource. SERP Captures Includes HTML files of the search engine results pages collected from Baidu, Microsoft Bing, DuckDuckGo, Google, Yahoo!, and Yandex. Each top-level subfolder is named after the respective search engine. Within each of these, there are folders named according to the language and the query intent associated with the search query. These folders contain the corresponding SERP HTML files. File names represent the search queries and may be either encoded or displayed as in the original dataset. Occurrence of Elements per SERP For each captured SERP, we recorded the frequency of each interface element. This data is organized in a relational database structure composed of the following CSV files: - elements.csv: Lists all identified SERP elements along with their corresponding IDs, categories, types, and subtypes (if applicable). - identifiers.csv: Contains the selectors or identifiers used for automatic detection of each element, along with their associated element ID, identifier ID, and the corresponding search engine ID. - intents.csv: Maps query intent names to their corresponding intent IDs. - search-engines.csv: Maps search engine names to their corresponding IDs. - main.csv: Records the frequency of each element in each captured SERP. Each row represents an observation and includes the following fields: element ID, identifier ID, search engine ID, query language, intent ID, query ID (as defined in the ORCAS-I-gold dataset), and the number of occurrences.
Search Engines Comparison and Websites Performance
zenodo.org
data.niaid.nih.gov
bin
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Ntimo; Vasilios Ntararas; Georgios Ntimo; Vasilios Ntararas (2023). Search Engines Comparison and Websites Performance [Dataset]. http://doi.org/10.5281/zenodo.8102700
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8102700
Dataset updated
Jul 1, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Georgios Ntimo; Vasilios Ntararas; Georgios Ntimo; Vasilios Ntararas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The current dataset is consisted of 200 search results extracted from Google and Bing engines (100 of Google and 100 of Bing). The search terms are selected from the 10 most search keywords of 2021 based on the provided data of Google Trends. The rest of the sheets include the performance of the websites according to three technical evaluation aspects. That is, SEO, Speed and Security. The performance dataset has been developed through the utilization of CheckBot crawling tool. The whole dataset can help information retrieval scientists to compare the two engines in terms of their position/ranking and their performance related to these factors.

For more information about the thinking of the of the structure of the dataset please contact the Information Management Lab of University of West Attica.

Contact Persons: Vasilis Ntararas (lb17032@uniwa.gr) , Georgios Ntimo (lb17100@uniwa.gr) and Ioannis C. Drivas (idrivas@uniwa.gr)
Z
Human Interaction Image (HII) dataset
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junnan Li; Yongkang Wong; Qi Zhao; Mohan S. Kankanhalli (2020). Human Interaction Image (HII) dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_832379
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
National University of Singapore
University of Minnesota
Authors
Junnan Li; Yongkang Wong; Qi Zhao; Mohan S. Kankanhalli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Human Interaction Image (HII) dataset is a new dataset containing Web images from Commercial Search Engines (Google, Bing and Flickr). We use keyword search to collect images corresponding to four types of interactions: handshake, highfive, hug, kiss. Then we manually filter the irrelevant images. The dataset contains 2410 images with at least 550 images per interaction.

The dataset can be applied, but not limited to the following research areas:

interaction recognition/prediction

action recognition

video analysis

transfer learning

Please cite the following paper if you use the HII dataset in your work (papers, articles, reports, books, software, etc):

J. Li, Y. Wong, Q.Zhao, M. Kankanhalli Attention Transfer from Web Images for Video Recognition ACM Multimedia, 2017. http://doi.org/10.1145/3123266.3123432
R
Indianfoodnet Dataset
universe.roboflow.com
zip
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IndianFoodNet (2023). Indianfoodnet Dataset [Dataset]. https://universe.roboflow.com/indianfoodnet/indianfoodnet/model/1
Explore at:
zipAvailable download formats
Dataset updated
Dec 4, 2023
Dataset authored and provided by
IndianFoodNet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Indian Dishes Bounding Boxes
Description
IndianFoodNet-30

About IndianFoodNet-30

IndianFoodNet-30 is created by Ritu Agarwal, Nikunj Bansal, Tanupriya Choudhury, Tanmay Sarkar & Neelu Jyothi Ahuja with a goal of building an Indian Food detection model. It contains more than 5500 images of 30 popular Indian food items.

Data collection

We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.

Fair use

This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).

Citation

If you find our dataset useful, please cite us as: @dataset{dataset, author = {Agarwal, Ritu and Bansal, Nikunj and Choudhury, Tanupriya and Sarkar, Tanmay and J.Ahuja, Neelu}, year = {2023}, title = {IndianFoodNet-30 Dataset}, publisher = {Roboflow Universe}, url = {https://universe.roboflow.com/indianfoodnet/indianfoodnet}, }
h
marketing-ai-agent
huggingface.co
Updated Aug 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepNLP (2025). marketing-ai-agent [Dataset]. https://huggingface.co/datasets/DeepNLP/marketing-ai-agent
Explore at:
Dataset updated
Aug 30, 2025
Authors
DeepNLP
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Marketing Agent Meta and Traffic Dataset in AI Agent Marketplace | AI Agent Directory | AI Agent Index from DeepNLP

This dataset is collected from AI Agent Marketplace Index and Directory at http://www.deepnlp.org, which contains AI Agents's meta information such as agent's name, website, description, as well as the monthly updated Web performance metrics, including Google,Bing average search ranking positions, Github Stars, Arxiv References, etc. The dataset is helpful for AI… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/marketing-ai-agent.
R
Indian_food Dataset
universe.roboflow.com
zip
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IndianFood (2024). Indian_food Dataset [Dataset]. https://universe.roboflow.com/indianfood/indian_food-pwzlc/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Jul 16, 2024
Dataset authored and provided by
IndianFood
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Indian Food Bounding Boxes
Description
IndianFood-7

About IndianFood-7

IndianFood-7 is created by Ritu Agarwal, Nikunj Bansal, Tanmay Sarkar, Tanupriya Choudhury and Neelu Jyothi Ahuja with a goal of building a Indian Food detection model. It contains more than 800 images of 7 popular Indian food items.

Data collection

We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.

Fair use

This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).

Food_new Dataset

universe.roboflow.com

zip

Updated Jul 16, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Allergen30 (2024). Food_new Dataset [Dataset]. https://universe.roboflow.com/allergen30/food_new-uuulf/dataset/2

Explore at:

zipAvailable download formats

Dataset updated

Jul 16, 2024

Dataset authored and provided by

Allergen30

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured

Food Bounding Boxes

Description

Allergen30

About Allergen30

Allergen30 is created by Mayank Mishra, Nikunj Bansal, Tanmay Sarkar and Tanupriya Choudhury with a goal of building a robust detection model that can assist people in avoiding possible allergic reactions.

It contains more than 6,000 images of 30 commonly used food items which can cause an adverse reaction within a human body. This dataset is one of the first research attempts in training a deep learning based computer vision model to detect the presence of such food items from images. It also serves as a benchmark for evaluating the efficacy of object detection methods in learning the otherwise difficult visual cues related to food items.

Description of class labels

There are multiple food items pertaining to specific food intolerances which can trigger an allergic reaction. Such food intolerance primarily include Lactose, Histamine, Gluten, Salicylate, Caffeine and Ovomucoid intolerance. https://github.com/mmayank74567/mmayank74567.github.io/blob/master/images/FoodIntol.png?raw=true" alt="Food intolerance">

The following table contains the description relating to the 30 class labels in our dataset.

S. No.	Allergen	Food label	Description
1	Ovomucoid	egg	Images of egg with yolk (e.g. sunny side up eggs)
2	Ovomucoid	whole_egg_boiled	Images of soft and hard boiled eggs
3	Lactose/Histamine	milk	Images of milk in a glass
4	Lactose	icecream	Images of icecream scoops
5	Lactose	cheese	Images of swiss cheese
6	Lactose/ Caffeine	milk_based_beverage	Images of tea/ coffee with milk in a cup/glass
7	Lactose/Caffeine	chocolate	Images of chocolate bars
8	Caffeine	non_milk_based_beverage	Images of soft drinks and tea/coffee without milk in a cup/glass
9	Histamine	cooked_meat	Images of cooked meat
10	Histamine	raw_meat	Images of raw meat
11	Histamine	alcohol	Images of alcohol bottles
12	Histamine	alcohol_glass	Images of wine glasses with alcohol
13	Histamine	spinach	Images of spinach bundle
14	Histamine	avocado	Images of avocado sliced in half
15	Histamine	eggplant	Images of eggplant
16	Salicylate	blueberry	Images of blueberry
17	Salicylate	blackberry	Images of blackberry
18	Salicylate	strawberry	Images of strawberry
19	Salicylate	pineapple	Images of pineapple
20	Salicylate	capsicum	Images of bell pepper
21	Salicylate	mushroom	Images of mushrooms
22	Salicylate	dates	Images of dates
23	Salicylate	almonds	Images of almonds
24	Salicylate	pistachios	Images of pistachios
25	Salicylate	tomato	Images of tomato and tomato slices
26	Gluten	roti	Images of roti
27	Gluten	pasta	Images of one serving of penne pasta
28	Gluten	bread	Images of bread slices
29	Gluten	bread_loaf	Images of bread loaf
30	Gluten	pizza	Images of pizza and pizza slices

Data collection

We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.

Fair use

This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).

**Citatio

4
A database of reviewed datasets to investigate the use of metadata and...
data.4tu.nl
zip
Updated Oct 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian J. Ellsäßer; Alice Nikuze (2025). A database of reviewed datasets to investigate the use of metadata and adoption of metadata standards for Uncrewed Aerial Vehicle (UAV) data [Dataset]. http://doi.org/10.4121/d845f33d-e199-4c96-8a1f-1db2ad9f2a9c.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/d845f33d-e199-4c96-8a1f-1db2ad9f2a9c.v1
Dataset updated
Oct 22, 2025
Dataset provided by
4TU.ResearchData
Authors
Florian J. Ellsäßer; Alice Nikuze
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The database was developed as part of a research project investigating the use and adoption of metadata standards for UAV (Uncrewed Aerial Vehicle) data. It compiles a list of published datasets containing UAV data or products generated based on UAV data identified through a systematic search of public data repositories. The search covered established data platforms, including DANS, 4TU.ResearchData, DataONE Science Data Bank, DRYAD, Figshare and Zenodo. In addition, a broader internet search using search engines such as Google, DuckDuckGo, Bing, and Perplexity was conducted to identify other publicly accessible UAV datasets. Only datasets with a persistent identifier, such as a DOI (Digital Object Identifier), were included.
Wheat Breeding Multimodal Dataset
zenodo.org
scidb.cn
bin, xls
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guofeng Yang; Yu Li; Yong He; Zhenjiang Zhou; Lingzhen Ye; Hui Fang; Xuping Feng; Guofeng Yang; Yu Li; Yong He; Zhenjiang Zhou; Lingzhen Ye; Hui Fang; Xuping Feng (2025). Wheat Breeding Multimodal Dataset [Dataset]. http://doi.org/10.5281/zenodo.14841928
Explore at:
bin, xlsAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14841928
Dataset updated
Feb 11, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Guofeng Yang; Yu Li; Yong He; Zhenjiang Zhou; Lingzhen Ye; Hui Fang; Xuping Feng; Guofeng Yang; Yu Li; Yong He; Zhenjiang Zhou; Lingzhen Ye; Hui Fang; Xuping Feng
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset is a wheat breeding multimodal dataset, including wheat germplasm data, wheat phenotypic data, wheat cultivation technique data, wheat plant protection technique data, wheat seed price data, UAV remote sensing data, and experimental site weather data. The data sources are field acquisition and online public data.

The wheat germplasm data comes from the Chinese Crop Germplasm Information Network (https://www.cgris.net/). The data on wheat cultivation technique and wheat plant protection technique come from search engines (Google, Bing, Baidu), and the search terms include "wheat cultivation technique, wheat plant protection technique, 小麦栽培技术, 小麦植保技术". The wheat seed historical price data comes from the National Seed Market Monitoring Information Release Platform (http://202.127.45.18/) - China. UAV remote sensing data is the result of further processing after being obtained from field experiments. Weather data comes from meteorological equipment at various agricultural experimental bases and meteorological observation stations of the China Meteorological Administration.

The acquisition and processing of data are described in the relevant part of the manuscript.

This dataset will be continuously updated in the future to help breeding work be carried out efficiently and accelerate the breeding process of excellent varieties.
h
search-recommendation-ai-agent
huggingface.co
Updated Apr 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepNLP (2025). search-recommendation-ai-agent [Dataset]. https://huggingface.co/datasets/DeepNLP/search-recommendation-ai-agent
Explore at:
Dataset updated
Apr 3, 2025
Authors
DeepNLP
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Search Recommendation Agent Meta and Traffic Dataset in AI Agent Marketplace | AI Agent Directory | AI Agent Index from DeepNLP

This dataset is collected from AI Agent Marketplace Index and Directory at http://www.deepnlp.org, which contains AI Agents's meta information such as agent's name, website, description, as well as the monthly updated Web performance metrics, including Google,Bing average search ranking positions, Github Stars, Arxiv References, etc. The dataset is helpful… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/search-recommendation-ai-agent.
🇸🇬 Lazada App Reviews from Google Store
kaggle.com
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2023). 🇸🇬 Lazada App Reviews from Google Store [Dataset]. http://doi.org/10.34740/kaggle/ds/3960245
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/3960245
Dataset updated
Nov 14, 2023
Dataset provided by
Kaggle
Authors
BwandoWando
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

From Lazada Wikipedia Page

Lazada Group (t/a Lazada) is an international e-commerce company and one of the largest e-commerce operators in Southeast Asia, with over 10,000 third-party sellers as of November 2014, and 50 million annual active buyers as of September 2019. Backed by Rocket Internet, Maximilian Bittner founded Lazada in 2012 as a marketplace platform that sells inventory to consumers from its own warehouses. Lazada modified its business model the following year to allow third-party retailers to sell their products on its platform too.[citation needed] The marketplace accounted for 65% of the company's sales in 2014.

This dataset contains Lazada app reviews in the Google store retrieved using RAPIDAPI.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F3a0439f466bf08a19a3b7713fa0f4049%2Flazada2.png?generation=1699232651135055&alt=media" alt="">

Usage

This dataset should paint a good picture on what is the public's perception of the app over the years. Using this dataset, we can do the following...

Extract sentiments and trends

Identify which version of the app had the most positive feedback, the worst.

Use topic modelling to identify the pain points of the application. (AND MANY MORE!)

Note

Images generated using Bing Image Generator
Virtual E Dataset
figshare.com
zip
Updated Oct 25, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seung Seog Han (2017). Virtual E Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.5513407.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5513407.v2
Dataset updated
Oct 25, 2017
Dataset provided by
Figsharehttp://figshare.com/
Authors
Seung Seog Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Virtual E DatasetE dataset (3317 images) - Diagnosis predicted by CNNs (ResNet-152 + VGG-19; arithmatic mean of both outputs; training dataset: A1)We created the E dataset to assess the semisupervised learning performance by conducting a Web-based image search for “tinea,” “onychomycosis,” “nail dystrophy,” “onycholysis,” and “melanonychia” in English, Korean, and Japanese on http://google.com and http://bing.com, and downloaded a total of 15,844 images. From these images, the R-CNNs created a nail dataset of 3,317 images, since we had to discard many images because of low image resolution. The CNNs (model: ResNet-152 + VGG-19; arithmetic mean of both outputs; training dataset: A1) automatically classified images generated by the R-CNNs into six classes (760 onychomycosis, 1,316 nail dystrophy, 363 onycholysis, 185 melanonychia, 424 normal, and 269 others).
Apparel Dataset
kaggle.com
Updated Apr 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kais (2020). Apparel Dataset [Dataset]. https://www.kaggle.com/kaiska/apparel-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 26, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kais
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset was created in order for me to practice multi-label classification based on Jeremy Howard's FastAi lecture 3. The dataset contains 8 different clothing categories in 9 different colours. The main objective of multi-label classification is to be able to label items found in photos based on these categories.

Content

The dataset consist of 16,170 images that where scraped from Google, Bing and DuckDuckGo, includes the following categories:

Black Dress: 450 Black Pants: 870 Black Shirt: 715 Black Shoes: 766 Black Shorts: 328 Black Suit: 320 Blue Dress: 502 Blue Pants: 798 Blue Shirt: 741 Blue Shoes: 523 Blue Shorts: 299 Brown Hoodie: 188 Brown Pants: 311 Brown Shoes: 464 Green Pants: 227 Green Shirt: 230 Green Shoes: 455 Green Shorts: 135 Green Suit: 243 Pink Hoodie: 347 Pink Pants: 246 Pink Skirt: 513 Red Dress: 800 Red Hoodie: 349 Red Pants: 308 Red Shirt: 332 Red Shoes: 610 Silver Shoes: 403 Silver Skirt: 361 White Dress: 818 White Pants: 274 White Shoes: 600 White Shorts: 120 White Suit: 354 Yellow Dress: 566 Yellow Shorts: 195 Yellow Skirt: 409

Acknowledgements

While searching the internet for a good dataset to apply multilabel classification on, I stumbled upon pyimagesearch's multi-label classification with keras's article, and Adrian used a very simple and small dataset containing 3 clothing categories. But to expand on the dataset, I combined it with trolukovich's dataset and my own by scraping Google and Bing using cwerner's fastclass package.
Australian National Data Service
data.wu.ac.at
gimi9.com
+1more
html, xml
Updated Apr 8, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian National Data Service (2015). Australian National Data Service [Dataset]. https://data.wu.ac.at/odso/data_gov_au/MjRkNWIxOWYtMmZkMy00M2NjLWIwYzctNmVhYmUzOGM0YjA1
Explore at:
html, xmlAvailable download formats
Dataset updated
Apr 8, 2015
Dataset provided by
Australian Research Data Commons
Area covered
Australia
Description
Research Data Australia is an Internet-based collection designed to promote visibility of Australian research data in search engines such as Google and Bing. Research Data Australia aims to provide a comprehensive window into the Australian Research Data Commons. It provides connections between data, projects, researchers and services across organisations and discipline

Research is producing larger and more complex data than ever before. It is imperative that these data outputs are effectively managed and shared. Better data – better described, more connected, more integrated and organised, more accessible, more easily used for new purposes – allows new questions to be investigated, larger issues to be investigated, and data landscapes to be explored.

Facebook

Twitter

Click to copy link

Link copied

Cite

Casey Miller (2025). AI-Powered Search Features: Preparing for Google SGE and Bing Chat Integration [Dataset]. https://caseysseo.com/ai-powered-search-features-preparing-for-google-sge-and-bing-chat-integration

AI-Powered Search Features: Preparing for Google SGE and Bing Chat Integration

Explore at:

txtAvailable download formats

Dataset updated

Aug 21, 2025

Dataset provided by

Casey's SEO

Authors

Casey Miller

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

2025

Variables measured

Colorado Springs Mobile Search Growth, Increase in Quality Leads from Bing Chat, Increase in Leads for Colorado Springs Contractor, Increase in Clicks for Local Businesses in Google SGE, Average Number of Source Citations per Google SGE Answer, Percentage of Local Service Queries with AI-Powered Features

Measurement technique

Qualitative and quantitative data from search engine performance monitoring, First-hand observations from local business optimization campaigns, Industry research and analysis

Description

This dataset provides detailed information about the rise of AI-powered search features, such as Google's Search Generative Experience (SGE) and Bing's Chat integration, and how local businesses can optimize their online presence to capitalize on these new search trends. The dataset covers the current state of AI search features, the unique opportunities and challenges for local businesses, and actionable strategies for improving visibility in this evolving search landscape.

Clear search

Close search

Google apps

Main menu

AI-Powered Search Features: Preparing for Google SGE and Bing Chat...

Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

comp-serp-data

Schema Markup Implementation: Structured Data Strategies for Google and Bing...

hyperlinks

Data from: Inventory of online public databases and repositories holding...

Interface Element Frequencies in Search Engine Results Pages (SERPs) Across...

Search Engines Comparison and Websites Performance

Human Interaction Image (HII) dataset

Indianfoodnet Dataset

IndianFoodNet-30

About IndianFoodNet-30

Data collection

Fair use

Citation

marketing-ai-agent

Indian_food Dataset

IndianFood-7

About IndianFood-7

Data collection

Fair use

Food_new Dataset

Allergen30

About Allergen30

Description of class labels

Data collection

Fair use

**Citatio

A database of reviewed datasets to investigate the use of metadata and...

Wheat Breeding Multimodal Dataset

search-recommendation-ai-agent

🇸🇬 Lazada App Reviews from Google Store

Context

Usage

Note

Virtual E Dataset

Apparel Dataset

Context

Content

Acknowledgements

Australian National Data Service

AI-Powered Search Features: Preparing for Google SGE and Bing Chat Integration