32 datasets found
  1. C

    AI-Powered Search Features: Preparing for Google SGE and Bing Chat...

    • caseysseo.com
    txt
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey Miller (2025). AI-Powered Search Features: Preparing for Google SGE and Bing Chat Integration [Dataset]. https://caseysseo.com/ai-powered-search-features-preparing-for-google-sge-and-bing-chat-integration
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 21, 2025
    Dataset provided by
    Casey's SEO
    Authors
    Casey Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Variables measured
    Colorado Springs Mobile Search Growth, Increase in Quality Leads from Bing Chat, Increase in Leads for Colorado Springs Contractor, Increase in Clicks for Local Businesses in Google SGE, Average Number of Source Citations per Google SGE Answer, Percentage of Local Service Queries with AI-Powered Features
    Measurement technique
    Qualitative and quantitative data from search engine performance monitoring, First-hand observations from local business optimization campaigns, Industry research and analysis
    Description

    This dataset provides detailed information about the rise of AI-powered search features, such as Google's Search Generative Experience (SGE) and Bing's Chat integration, and how local businesses can optimize their online presence to capitalize on these new search trends. The dataset covers the current state of AI search features, the unique opportunities and challenges for local businesses, and actionable strategies for improving visibility in this evolving search landscape.

  2. Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Haak; Fabian Haak; Philipp Schaer; Philipp Schaer (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. http://doi.org/10.5281/zenodo.7682915
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fabian Haak; Fabian Haak; Philipp Schaer; Philipp Schaer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

    Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

    Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

    The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles.
    Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

    To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

    Dataset 2: Search Query Suggestions (suggestions.csv)

    The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

    The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

    We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

    AllSides Scraper

    At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

    We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.

  3. h

    comp-serp-data

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Goker Cebeci, comp-serp-data [Dataset]. https://huggingface.co/datasets/goker/comp-serp-data
    Explore at:
    Authors
    Goker Cebeci
    Description

    Comprehensive SERP Data

    This dataset contains comprehensive search engine ranking data collected from Google and Bing, along with extracted technical and content features for analyzing search engine ranking algorithms.

      📊 Dataset Overview
    

    Total Records: 14,465 search results Search Engines: Google (5,895 results) and Bing (8,570 results) Keywords: 500 diverse search queries Features: 20 features including technical scores, content analysis, and ranking metadata… See the full description on the dataset page: https://huggingface.co/datasets/goker/comp-serp-data.

  4. C

    Schema Markup Implementation: Structured Data Strategies for Google and Bing...

    • caseysseo.com
    txt
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey Miller (2025). Schema Markup Implementation: Structured Data Strategies for Google and Bing Visibility [Dataset]. https://caseysseo.com/schema-markup-implementation-structured-data-strategies-for-google-and-bing-visibility
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 21, 2025
    Dataset provided by
    Casey's SEO
    Authors
    Casey Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    Colorado Springs
    Variables measured
    Content Word Count, Click-through Rate Improvement, Trust Factor for Online Reviews, Military Population in Colorado Springs, Mobile Search Growth in Colorado Springs, Percentage of Websites Using Structured Data
    Measurement technique
    Reviewed industry reports and guidelines from leading search engines, Conducted customer surveys to gather feedback on purchasing decisions, Analyzed historical website performance data and search metrics
    Description

    This dataset provides comprehensive information on the importance of schema markup implementation for improving search visibility on Google and Bing. It covers the benefits of schema markup, the most impactful schema types for businesses, and effective implementation strategies. The dataset includes details on the creator, publisher, content coverage, data sources, and quantitative metrics related to the schema markup impact.

  5. h

    hyperlinks

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Goker Cebeci, hyperlinks [Dataset]. https://huggingface.co/datasets/goker/hyperlinks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Goker Cebeci
    Description

    Hyperlinks Dataset

    This dataset contains subpage links, their features, and corresponding search engine rankings from Google and Bing. The data was collected as part of the research project: "Accessible Hyperlinks and Search Engine Rankings: An Empirical Investigation".

      Dataset Description
    

    This dataset is designed to facilitate research on the relationship between website accessibility, specifically hyperlink accessibility, and search engine rankings. It consists of… See the full description on the dataset page: https://huggingface.co/datasets/goker/hyperlinks.

  6. Data from: Inventory of online public databases and repositories holding...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

  7. i

    Interface Element Frequencies in Search Engine Results Pages (SERPs) Across...

    • rdm.inesctec.pt
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Interface Element Frequencies in Search Engine Results Pages (SERPs) Across Query Intents, Search Engines and Languages [Dataset]. https://rdm.inesctec.pt/dataset/cs-2025-006
    Explore at:
    Dataset updated
    Jul 22, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the data produced for the dissertation ""User Interface Variations in Search Engine Results Pages Across Types of Search Queries and Search Engines"". The project was conducted by student Adelaide Miranda Santos at FEUP, University of Porto, as part of the Masters in Informatics and Computing Engineering. The primary objective of this work is to study interface variations in search engine results pages (SERPs) across different search engines and types of search queries. To this end, nearly 8,000 SERPs were captured using the ORCAS-I-gold dataset across six leading web search engines: Google, Microsoft Bing, Yandex, Yahoo!, Baidu, and DuckDuckGo. For each captured SERP, the number of occurrences of each interface element was recorded. Additionally, to analyze how the language of a search query affects SERP composition in Yandex and Baidu, the original English queries were translated into Russian and Simplified Chinese." The dataset is organized in the following folders: Search Query Dataset Translation Contains the search queries from the ORCAS-I-gold dataset translated into Russian and Simplified Chinese. The translation was made using ChatGPT-4o and verified by native speakers. In addition to the translated queries, the complete original ORCAS-I-gold dataset is also included as an independent resource. SERP Captures Includes HTML files of the search engine results pages collected from Baidu, Microsoft Bing, DuckDuckGo, Google, Yahoo!, and Yandex. Each top-level subfolder is named after the respective search engine. Within each of these, there are folders named according to the language and the query intent associated with the search query. These folders contain the corresponding SERP HTML files. File names represent the search queries and may be either encoded or displayed as in the original dataset. Occurrence of Elements per SERP For each captured SERP, we recorded the frequency of each interface element. This data is organized in a relational database structure composed of the following CSV files: - elements.csv: Lists all identified SERP elements along with their corresponding IDs, categories, types, and subtypes (if applicable). - identifiers.csv: Contains the selectors or identifiers used for automatic detection of each element, along with their associated element ID, identifier ID, and the corresponding search engine ID. - intents.csv: Maps query intent names to their corresponding intent IDs. - search-engines.csv: Maps search engine names to their corresponding IDs. - main.csv: Records the frequency of each element in each captured SERP. Each row represents an observation and includes the following fields: element ID, identifier ID, search engine ID, query language, intent ID, query ID (as defined in the ORCAS-I-gold dataset), and the number of occurrences.

  8. Search Engines Comparison and Websites Performance

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgios Ntimo; Vasilios Ntararas; Georgios Ntimo; Vasilios Ntararas (2023). Search Engines Comparison and Websites Performance [Dataset]. http://doi.org/10.5281/zenodo.8102700
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Georgios Ntimo; Vasilios Ntararas; Georgios Ntimo; Vasilios Ntararas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The current dataset is consisted of 200 search results extracted from Google and Bing engines (100 of Google and 100 of Bing). The search terms are selected from the 10 most search keywords of 2021 based on the provided data of Google Trends. The rest of the sheets include the performance of the websites according to three technical evaluation aspects. That is, SEO, Speed and Security. The performance dataset has been developed through the utilization of CheckBot crawling tool. The whole dataset can help information retrieval scientists to compare the two engines in terms of their position/ranking and their performance related to these factors.

    For more information about the thinking of the of the structure of the dataset please contact the Information Management Lab of University of West Attica.

    Contact Persons: Vasilis Ntararas (lb17032@uniwa.gr) , Georgios Ntimo (lb17100@uniwa.gr) and Ioannis C. Drivas (idrivas@uniwa.gr)

  9. Z

    Human Interaction Image (HII) dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junnan Li; Yongkang Wong; Qi Zhao; Mohan S. Kankanhalli (2020). Human Interaction Image (HII) dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_832379
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    National University of Singapore
    University of Minnesota
    Authors
    Junnan Li; Yongkang Wong; Qi Zhao; Mohan S. Kankanhalli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Human Interaction Image (HII) dataset is a new dataset containing Web images from Commercial Search Engines (Google, Bing and Flickr). We use keyword search to collect images corresponding to four types of interactions: handshake, highfive, hug, kiss. Then we manually filter the irrelevant images. The dataset contains 2410 images with at least 550 images per interaction.

    The dataset can be applied, but not limited to the following research areas:

    interaction recognition/prediction

    action recognition

    video analysis

    transfer learning

    Please cite the following paper if you use the HII dataset in your work (papers, articles, reports, books, software, etc):

    J. Li, Y. Wong, Q.Zhao, M. Kankanhalli Attention Transfer from Web Images for Video Recognition ACM Multimedia, 2017. http://doi.org/10.1145/3123266.3123432

  10. R

    Indianfoodnet Dataset

    • universe.roboflow.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IndianFoodNet (2023). Indianfoodnet Dataset [Dataset]. https://universe.roboflow.com/indianfoodnet/indianfoodnet/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset authored and provided by
    IndianFoodNet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Indian Dishes Bounding Boxes
    Description

    IndianFoodNet-30

    About IndianFoodNet-30

    IndianFoodNet-30 is created by Ritu Agarwal, Nikunj Bansal, Tanupriya Choudhury, Tanmay Sarkar & Neelu Jyothi Ahuja with a goal of building an Indian Food detection model. It contains more than 5500 images of 30 popular Indian food items.

    Data collection

    We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.

    Fair use

    This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).

    Citation

    If you find our dataset useful, please cite us as: @dataset{dataset, author = {Agarwal, Ritu and Bansal, Nikunj and Choudhury, Tanupriya and Sarkar, Tanmay and J.Ahuja, Neelu}, year = {2023}, title = {IndianFoodNet-30 Dataset}, publisher = {Roboflow Universe}, url = {https://universe.roboflow.com/indianfoodnet/indianfoodnet}, }

  11. h

    marketing-ai-agent

    • huggingface.co
    Updated Aug 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepNLP (2025). marketing-ai-agent [Dataset]. https://huggingface.co/datasets/DeepNLP/marketing-ai-agent
    Explore at:
    Dataset updated
    Aug 30, 2025
    Authors
    DeepNLP
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Marketing Agent Meta and Traffic Dataset in AI Agent Marketplace | AI Agent Directory | AI Agent Index from DeepNLP

    This dataset is collected from AI Agent Marketplace Index and Directory at http://www.deepnlp.org, which contains AI Agents's meta information such as agent's name, website, description, as well as the monthly updated Web performance metrics, including Google,Bing average search ranking positions, Github Stars, Arxiv References, etc. The dataset is helpful for AI… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/marketing-ai-agent.

  12. R

    Indian_food Dataset

    • universe.roboflow.com
    zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IndianFood (2024). Indian_food Dataset [Dataset]. https://universe.roboflow.com/indianfood/indian_food-pwzlc/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    IndianFood
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Indian Food Bounding Boxes
    Description

    IndianFood-7

    About IndianFood-7

    IndianFood-7 is created by Ritu Agarwal, Nikunj Bansal, Tanmay Sarkar, Tanupriya Choudhury and Neelu Jyothi Ahuja with a goal of building a Indian Food detection model. It contains more than 800 images of 7 popular Indian food items.

    Data collection

    We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.

    Fair use

    This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).

  13. R

    Food_new Dataset

    • universe.roboflow.com
    zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allergen30 (2024). Food_new Dataset [Dataset]. https://universe.roboflow.com/allergen30/food_new-uuulf/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    Allergen30
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Food Bounding Boxes
    Description

    Allergen30

    About Allergen30

    Allergen30 is created by Mayank Mishra, Nikunj Bansal, Tanmay Sarkar and Tanupriya Choudhury with a goal of building a robust detection model that can assist people in avoiding possible allergic reactions.

    It contains more than 6,000 images of 30 commonly used food items which can cause an adverse reaction within a human body. This dataset is one of the first research attempts in training a deep learning based computer vision model to detect the presence of such food items from images. It also serves as a benchmark for evaluating the efficacy of object detection methods in learning the otherwise difficult visual cues related to food items.

    Description of class labels

    There are multiple food items pertaining to specific food intolerances which can trigger an allergic reaction. Such food intolerance primarily include Lactose, Histamine, Gluten, Salicylate, Caffeine and Ovomucoid intolerance. https://github.com/mmayank74567/mmayank74567.github.io/blob/master/images/FoodIntol.png?raw=true" alt="Food intolerance">

    The following table contains the description relating to the 30 class labels in our dataset.

    S. No.AllergenFood labelDescription
    1OvomucoideggImages of egg with yolk (e.g. sunny side up eggs)
    2Ovomucoidwhole_egg_boiledImages of soft and hard boiled eggs
    3Lactose/HistaminemilkImages of milk in a glass
    4LactoseicecreamImages of icecream scoops
    5LactosecheeseImages of swiss cheese
    6Lactose/ Caffeinemilk_based_beverageImages of tea/ coffee with milk in a cup/glass
    7Lactose/CaffeinechocolateImages of chocolate bars
    8Caffeinenon_milk_based_beverageImages of soft drinks and tea/coffee without milk in a cup/glass
    9Histaminecooked_meatImages of cooked meat
    10Histamineraw_meatImages of raw meat
    11HistaminealcoholImages of alcohol bottles
    12Histaminealcohol_glassImages of wine glasses with alcohol
    13HistaminespinachImages of spinach bundle
    14HistamineavocadoImages of avocado sliced in half
    15HistamineeggplantImages of eggplant
    16SalicylateblueberryImages of blueberry
    17SalicylateblackberryImages of blackberry
    18SalicylatestrawberryImages of strawberry
    19SalicylatepineappleImages of pineapple
    20SalicylatecapsicumImages of bell pepper
    21SalicylatemushroomImages of mushrooms
    22SalicylatedatesImages of dates
    23SalicylatealmondsImages of almonds
    24SalicylatepistachiosImages of pistachios
    25SalicylatetomatoImages of tomato and tomato slices
    26GlutenrotiImages of roti
    27GlutenpastaImages of one serving of penne pasta
    28GlutenbreadImages of bread slices
    29Glutenbread_loafImages of bread loaf
    30GlutenpizzaImages of pizza and pizza slices

    Data collection

    We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.

    Fair use

    This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).

    **Citatio

  14. 4

    A database of reviewed datasets to investigate the use of metadata and...

    • data.4tu.nl
    zip
    Updated Oct 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian J. Ellsäßer; Alice Nikuze (2025). A database of reviewed datasets to investigate the use of metadata and adoption of metadata standards for Uncrewed Aerial Vehicle (UAV) data [Dataset]. http://doi.org/10.4121/d845f33d-e199-4c96-8a1f-1db2ad9f2a9c.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    4TU.ResearchData
    Authors
    Florian J. Ellsäßer; Alice Nikuze
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The database was developed as part of a research project investigating the use and adoption of metadata standards for UAV (Uncrewed Aerial Vehicle) data. It compiles a list of published datasets containing UAV data or products generated based on UAV data identified through a systematic search of public data repositories. The search covered established data platforms, including DANS, 4TU.ResearchData, DataONE Science Data Bank, DRYAD, Figshare and Zenodo. In addition, a broader internet search using search engines such as Google, DuckDuckGo, Bing, and Perplexity was conducted to identify other publicly accessible UAV datasets. Only datasets with a persistent identifier, such as a DOI (Digital Object Identifier), were included.

  15. Wheat Breeding Multimodal Dataset

    • zenodo.org
    • scidb.cn
    bin, xls
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guofeng Yang; Yu Li; Yong He; Zhenjiang Zhou; Lingzhen Ye; Hui Fang; Xuping Feng; Guofeng Yang; Yu Li; Yong He; Zhenjiang Zhou; Lingzhen Ye; Hui Fang; Xuping Feng (2025). Wheat Breeding Multimodal Dataset [Dataset]. http://doi.org/10.5281/zenodo.14841928
    Explore at:
    bin, xlsAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Guofeng Yang; Yu Li; Yong He; Zhenjiang Zhou; Lingzhen Ye; Hui Fang; Xuping Feng; Guofeng Yang; Yu Li; Yong He; Zhenjiang Zhou; Lingzhen Ye; Hui Fang; Xuping Feng
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset is a wheat breeding multimodal dataset, including wheat germplasm data, wheat phenotypic data, wheat cultivation technique data, wheat plant protection technique data, wheat seed price data, UAV remote sensing data, and experimental site weather data. The data sources are field acquisition and online public data.

    The wheat germplasm data comes from the Chinese Crop Germplasm Information Network (https://www.cgris.net/). The data on wheat cultivation technique and wheat plant protection technique come from search engines (Google, Bing, Baidu), and the search terms include "wheat cultivation technique, wheat plant protection technique, 小麦栽培技术, 小麦植保技术". The wheat seed historical price data comes from the National Seed Market Monitoring Information Release Platform (http://202.127.45.18/) - China. UAV remote sensing data is the result of further processing after being obtained from field experiments. Weather data comes from meteorological equipment at various agricultural experimental bases and meteorological observation stations of the China Meteorological Administration.

    The acquisition and processing of data are described in the relevant part of the manuscript.

    This dataset will be continuously updated in the future to help breeding work be carried out efficiently and accelerate the breeding process of excellent varieties.

  16. h

    search-recommendation-ai-agent

    • huggingface.co
    Updated Apr 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepNLP (2025). search-recommendation-ai-agent [Dataset]. https://huggingface.co/datasets/DeepNLP/search-recommendation-ai-agent
    Explore at:
    Dataset updated
    Apr 3, 2025
    Authors
    DeepNLP
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Search Recommendation Agent Meta and Traffic Dataset in AI Agent Marketplace | AI Agent Directory | AI Agent Index from DeepNLP

    This dataset is collected from AI Agent Marketplace Index and Directory at http://www.deepnlp.org, which contains AI Agents's meta information such as agent's name, website, description, as well as the monthly updated Web performance metrics, including Google,Bing average search ranking positions, Github Stars, Arxiv References, etc. The dataset is helpful… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/search-recommendation-ai-agent.

  17. 🇸🇬 Lazada App Reviews from Google Store

    • kaggle.com
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2023). 🇸🇬 Lazada App Reviews from Google Store [Dataset]. http://doi.org/10.34740/kaggle/ds/3960245
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Kaggle
    Authors
    BwandoWando
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    From Lazada Wikipedia Page

    Lazada Group (t/a Lazada) is an international e-commerce company and one of the largest e-commerce operators in Southeast Asia, with over 10,000 third-party sellers as of November 2014, and 50 million annual active buyers as of September 2019. Backed by Rocket Internet, Maximilian Bittner founded Lazada in 2012 as a marketplace platform that sells inventory to consumers from its own warehouses. Lazada modified its business model the following year to allow third-party retailers to sell their products on its platform too.[citation needed] The marketplace accounted for 65% of the company's sales in 2014.

    This dataset contains Lazada app reviews in the Google store retrieved using RAPIDAPI.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F3a0439f466bf08a19a3b7713fa0f4049%2Flazada2.png?generation=1699232651135055&alt=media" alt="">

    Usage

    This dataset should paint a good picture on what is the public's perception of the app over the years. Using this dataset, we can do the following...

    1. Extract sentiments and trends
    2. Identify which version of the app had the most positive feedback, the worst.
    3. Use topic modelling to identify the pain points of the application. (AND MANY MORE!)

    Note

    Images generated using Bing Image Generator

  18. Virtual E Dataset

    • figshare.com
    zip
    Updated Oct 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seung Seog Han (2017). Virtual E Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.5513407.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 25, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Seung Seog Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • Virtual E DatasetE dataset (3317 images) - Diagnosis predicted by CNNs (ResNet-152 + VGG-19; arithmatic mean of both outputs; training dataset: A1)We created the E dataset to assess the semisupervised learning performance by conducting a Web-based image search for “tinea,” “onychomycosis,” “nail dystrophy,” “onycholysis,” and “melanonychia” in English, Korean, and Japanese on http://google.com and http://bing.com, and downloaded a total of 15,844 images. From these images, the R-CNNs created a nail dataset of 3,317 images, since we had to discard many images because of low image resolution. The CNNs (model: ResNet-152 + VGG-19; arithmetic mean of both outputs; training dataset: A1) automatically classified images generated by the R-CNNs into six classes (760 onychomycosis, 1,316 nail dystrophy, 363 onycholysis, 185 melanonychia, 424 normal, and 269 others).
  19. Apparel Dataset

    • kaggle.com
    Updated Apr 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kais (2020). Apparel Dataset [Dataset]. https://www.kaggle.com/kaiska/apparel-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kais
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset was created in order for me to practice multi-label classification based on Jeremy Howard's FastAi lecture 3. The dataset contains 8 different clothing categories in 9 different colours. The main objective of multi-label classification is to be able to label items found in photos based on these categories.

    Content

    The dataset consist of 16,170 images that where scraped from Google, Bing and DuckDuckGo, includes the following categories:

    Black Dress: 450 Black Pants: 870 Black Shirt: 715 Black Shoes: 766 Black Shorts: 328 Black Suit: 320 Blue Dress: 502 Blue Pants: 798 Blue Shirt: 741 Blue Shoes: 523 Blue Shorts: 299 Brown Hoodie: 188 Brown Pants: 311 Brown Shoes: 464 Green Pants: 227 Green Shirt: 230 Green Shoes: 455 Green Shorts: 135 Green Suit: 243 Pink Hoodie: 347 Pink Pants: 246 Pink Skirt: 513 Red Dress: 800 Red Hoodie: 349 Red Pants: 308 Red Shirt: 332 Red Shoes: 610 Silver Shoes: 403 Silver Skirt: 361 White Dress: 818 White Pants: 274 White Shoes: 600 White Shorts: 120 White Suit: 354 Yellow Dress: 566 Yellow Shorts: 195 Yellow Skirt: 409

    Acknowledgements

    While searching the internet for a good dataset to apply multilabel classification on, I stumbled upon pyimagesearch's multi-label classification with keras's article, and Adrian used a very simple and small dataset containing 3 clothing categories. But to expand on the dataset, I combined it with trolukovich's dataset and my own by scraping Google and Bing using cwerner's fastclass package.

  20. Australian National Data Service

    • data.wu.ac.at
    • gimi9.com
    • +1more
    html, xml
    Updated Apr 8, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Australian National Data Service (2015). Australian National Data Service [Dataset]. https://data.wu.ac.at/odso/data_gov_au/MjRkNWIxOWYtMmZkMy00M2NjLWIwYzctNmVhYmUzOGM0YjA1
    Explore at:
    html, xmlAvailable download formats
    Dataset updated
    Apr 8, 2015
    Dataset provided by
    Australian Research Data Commons
    Area covered
    Australia
    Description

    Research Data Australia is an Internet-based collection designed to promote visibility of Australian research data in search engines such as Google and Bing. Research Data Australia aims to provide a comprehensive window into the Australian Research Data Commons. It provides connections between data, projects, researchers and services across organisations and discipline

    Research is producing larger and more complex data than ever before. It is imperative that these data outputs are effectively managed and shared. Better data – better described, more connected, more integrated and organised, more accessible, more easily used for new purposes – allows new questions to be investigated, larger issues to be investigated, and data landscapes to be explored.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Casey Miller (2025). AI-Powered Search Features: Preparing for Google SGE and Bing Chat Integration [Dataset]. https://caseysseo.com/ai-powered-search-features-preparing-for-google-sge-and-bing-chat-integration

AI-Powered Search Features: Preparing for Google SGE and Bing Chat Integration

Explore at:
txtAvailable download formats
Dataset updated
Aug 21, 2025
Dataset provided by
Casey's SEO
Authors
Casey Miller
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
2025
Variables measured
Colorado Springs Mobile Search Growth, Increase in Quality Leads from Bing Chat, Increase in Leads for Colorado Springs Contractor, Increase in Clicks for Local Businesses in Google SGE, Average Number of Source Citations per Google SGE Answer, Percentage of Local Service Queries with AI-Powered Features
Measurement technique
Qualitative and quantitative data from search engine performance monitoring, First-hand observations from local business optimization campaigns, Industry research and analysis
Description

This dataset provides detailed information about the rise of AI-powered search features, such as Google's Search Generative Experience (SGE) and Bing's Chat integration, and how local businesses can optimize their online presence to capitalize on these new search trends. The dataset covers the current state of AI search features, the unique opportunities and challenges for local businesses, and actionable strategies for improving visibility in this evolving search landscape.

Search
Clear search
Close search
Google apps
Main menu