100+ datasets found
  1. Recipes Search Engine Results Data

    • kaggle.com
    zip
    Updated Mar 30, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elias Dabbas (2019). Recipes Search Engine Results Data [Dataset]. https://www.kaggle.com/datasets/eliasdabbas/recipes-search-engine-results-data
    Explore at:
    zip(6875244 bytes)Available download formats
    Dataset updated
    Mar 30, 2019
    Authors
    Elias Dabbas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Recipe keywords' positions on search; Google and YouTube.
    These datasets can be interesting for SEO research for the recipes industry.

    Content

    243 national recipes (based on Wikipedia's national dish list)
    2 keyword versions dish recipe and how to make dish
    Total 486 queries (10 results each)

    Google: 4,860 rows (defaults to 10 per result, and some missing)
    YouTube: 1,455 rows (defaults to 5 per result, and some missing)

    Acknowledgements

    Google CSE API, YouTube API, Python, requests, pandas, advertools.

    Inspiration

    It's interesting to know about how things are visible from a search engine perspective, and compare Google and YouTube as well.
    National dishes are mostly delicious as well!

  2. Market share of leading desktop search engines worldwide monthly 2015-2025

    • statista.com
    • freeagenlt.com
    • +1more
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Market share of leading desktop search engines worldwide monthly 2015-2025 [Dataset]. https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2015 - Oct 2025
    Area covered
    Worldwide
    Description

    As of October 2025, Google represented ***** percent of the global online search engine referrals on desktop devices. Despite being much ahead of its competitors, this represents a modest increase from the previous months. Meanwhile, its longtime competitor Bing accounted for ***** percent, as tools like Yahoo and Yandex held shares of over **** percent and **** percent respectively. Google and the global search market Ever since the introduction of Google Search in 1997, the company has dominated the search engine market, while the shares of all other tools has been rather lopsided. The majority of Google revenues are generated through advertising. Its parent corporation, Alphabet, was one of the biggest internet companies worldwide as of 2024, with a market capitalization of **** trillion U.S. dollars. The company has also expanded its services to mail, productivity tools, enterprise products, mobile devices, and other ventures. As a result, Google earned one of the highest tech company revenues in 2024 with roughly ****** billion U.S. dollars. Search engine usage in different countries Google is the most frequently used search engine worldwide. But in some countries, its alternatives are leading or competing with it to some extent. As of the last quarter of 2023, more than ** percent of internet users in Russia used Yandex, whereas Google users represented little over ** percent. Meanwhile, Baidu was the most used search engine in China, despite a strong decrease in the percentage of internet users in the country accessing it. In other countries, like Japan and Mexico, people tend to use Yahoo along with Google. By the end of 2024, nearly half of the respondents in Japan said that they had used Yahoo in the past four weeks. In the same year, over ** percent of users in Mexico said they used Yahoo.

  3. Top AI Tools Dataset

    • kaggle.com
    zip
    Updated Sep 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleesha Nadeem (2025). Top AI Tools Dataset [Dataset]. https://www.kaggle.com/datasets/nalisha/top-ai-tools-dataset
    Explore at:
    zip(1261 bytes)Available download formats
    Dataset updated
    Sep 5, 2025
    Authors
    Aleesha Nadeem
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains a curated collection of AI tools that are widely used across different domains such as text generation, image processing, video editing, coding assistance, research, and education.

    The goal of this dataset is to provide researchers, developers, and learners with a comprehensive reference of AI tools, their categories, features, and use cases. By organizing tools into categories, this dataset makes it easier to analyze, compare, and explore the fast-growing AI ecosystem.

    Dataset Highlights

    Categories of AI Tools (Text, Image, Video, Coding, Productivity, Research, etc.)

    Tool Names and Descriptions

    Key Features

    Use Cases / Applications

    Website / Platform (if available)

    Possible Use Cases

    Data Analysis:

    Studying trends in AI tool adoption.

    Education:

    Learning about available AI technologies.

    Development:

    Identifying:

    the right tools for projects.

    Research:

    Exploring the evolution of AI tools across industries.

  4. Federal Item Name Directory (H6) Search Tool

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Nov 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Defense (2020). Federal Item Name Directory (H6) Search Tool [Dataset]. https://catalog.data.gov/dataset/federal-item-name-directory-h6-search-tool
    Explore at:
    Dataset updated
    Nov 29, 2020
    Dataset provided by
    United States Department of Warhttps://war.gov/
    Description

    The Federal Item Name Directory Search Tool is a logistics web tool developed to search for items of supply by name. Federal Item Name Directory provides item name data for the development and maintenance of item identifications within the Federal Catalog System. The tool provides 4 options to search the H6 directory: Keyword, Federal Supply Class (FSC), Federal Item Identification Guide (FIIG), and Item Name Code (INC). The result data is displayed, providing all related data elements.

  5. Data from: Inventory of online public databases and repositories holding...

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

  6. Top AI tools (with purpose)

    • kaggle.com
    zip
    Updated Feb 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohd Ajeem (2025). Top AI tools (with purpose) [Dataset]. https://www.kaggle.com/datasets/ajeemansari/top-ai-tools-with-purpose/code
    Explore at:
    zip(2170 bytes)Available download formats
    Dataset updated
    Feb 15, 2025
    Authors
    Mohd Ajeem
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📌 Dataset Overview: This dataset contains a comprehensive list of AI tools that are available for free. It is designed to help data scientists, B.Tech students, and AI enthusiasts find useful AI-powered tools without any cost. The dataset includes tool names, categories, descriptions, and purposes, making it a valuable resource for students and professionals in the AI field.

    🔍 Key Features: Tool Name: The name of the AI tool. Category: Type of AI tool (e.g., Machine Learning, NLP, Computer Vision, Chatbots). Description: A brief overview of the tool and its capabilities. Purpose: The main use case of the tool (e.g., Data Analysis, Image Processing, Text Generation). Free or Freemium: Specifies if the tool is completely free or has premium features. Website/Source: Official website or source for accessing the tool. 🎯 Purpose of the Dataset: For Data Scientists: Helps in discovering free AI tools for research and projects. For B.Tech Students: A useful resource for students learning AI and working on academic projects. For AI Enthusiasts: Provides a list of AI tools to explore and experiment with. For Developers & Researchers: Assists in finding the best AI tools for software development and innovation.

  7. F

    Free Online Survey Software and Tools Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Free Online Survey Software and Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/free-online-survey-software-and-tools-54735
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Apr 3, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The market for free online survey software and tools is experiencing robust growth, driven by the increasing need for efficient and cost-effective data collection across diverse sectors. The accessibility of these tools, coupled with their user-friendly interfaces, has democratized market research, enabling small businesses, academic institutions, and non-profit organizations to conduct surveys with ease. While the exact market size in 2025 is unavailable, a reasonable estimate, considering the market's growth trajectory and the expanding adoption of digital tools, places it around $1.5 billion. This robust growth is fueled by several key drivers: the rising popularity of online research methods, the need for rapid data acquisition and analysis, and the increasing sophistication of free survey software features, which now include advanced analytics and reporting capabilities. Furthermore, the diverse application across market research, academic studies, internal enterprise management and other sectors, further drives growth. Market segmentation by survey type (mobile vs. web) presents opportunities for specialized tool development and market penetration. Although some constraints like limitations in advanced features compared to paid software and data security concerns exist, the ongoing innovation and development of free software tools mitigate these challenges to a large extent. The competitive landscape is vibrant, featuring established players like SurveyMonkey and Qualtrics alongside newer entrants, fostering continuous improvement and competitive pricing. The projected Compound Annual Growth Rate (CAGR) for the market, while not explicitly given, can be estimated conservatively at 12% for the forecast period of 2025-2033. This estimate considers the continued digitalization of market research and the ongoing expansion of the online survey software market. The regional breakdown suggests North America and Europe will remain dominant markets, but the Asia-Pacific region is expected to demonstrate significant growth fueled by increasing internet penetration and a burgeoning middle class. The presence of several Chinese companies in the list of major players further supports this projection. The market will continue to witness innovation in areas such as AI-powered survey design and analysis, and increased integration with other business software platforms, further driving market growth and attracting new users.

  8. d

    Business Listings Database (Google My Business Databases)

    • datarade.ai
    .json, .csv
    Updated Mar 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataForSEO (2023). Business Listings Database (Google My Business Databases) [Dataset]. https://datarade.ai/data-products/business-listings-database-google-my-business-databases-dataforseo
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Mar 22, 2023
    Dataset authored and provided by
    DataForSEO
    Area covered
    French Polynesia, Kiribati, Saint Martin (French part), Guadeloupe, Barbados, Ireland, Libya, Bulgaria, Puerto Rico, Niger
    Description

    Business Listings Database is the source of point-of-interest data and can provide you with all the information you need to analyze how specific places are used, what kinds of audiences they attract, and how their visitor profile changes over time.

    The full fields description may be found on this page: https://docs.dataforseo.com/v3/databases/business_listings/?bash

  9. m

    Coronavirus Panoply.io for Database Warehousing and Post Analysis using...

    • data.mendeley.com
    Updated Feb 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav Pandya (2020). Coronavirus Panoply.io for Database Warehousing and Post Analysis using Sequal Language (SQL) [Dataset]. http://doi.org/10.17632/4gphfg5tgs.2
    Explore at:
    Dataset updated
    Feb 4, 2020
    Authors
    Pranav Pandya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It has never been easier to solve any database related problem using any sequel language and the following gives an opportunity for you guys to understand how I was able to figure out some of the interline relationships between databases using Panoply.io tool.

    I was able to insert coronavirus dataset and create a submittable, reusable result. I hope it helps you work in Data Warehouse environment.

    The following is list of SQL commands performed on dataset attached below with the final output as stored in Exports Folder QUERY 1 SELECT "Province/State" As "Region", Deaths, Recovered, Confirmed FROM "public"."coronavirus_updated" WHERE Recovered>(Deaths/2) AND Deaths>0 Description: How will we estimate where Coronavirus has infiltrated, but there is effective recovery amongst patients? We can view those places by having Recovery twice more than the Death Toll.

    Query 2 SELECT country, sum(confirmed) as "Confirmed Count", sum(Recovered) as "Recovered Count", sum(Deaths) as "Death Toll" FROM "public"."coronavirus_updated" WHERE Recovered>(Deaths/2) AND Confirmed>0 GROUP BY country

    Description: Coronavirus Epidemic has infiltrated multiple countries, and the only way to be safe is by knowing the countries which have confirmed Coronavirus Cases. So here is a list of those countries

    Query 3 SELECT country as "Countries where Coronavirus has reached" FROM "public"."coronavirus_updated" WHERE confirmed>0 GROUP BY country Description: Coronavirus Epidemic has infiltrated multiple countries, and the only way to be safe is by knowing the countries which have confirmed Coronavirus Cases. So here is a list of those countries.

    Query 4 SELECT country, sum(suspected) as "Suspected Cases under potential CoronaVirus outbreak" FROM "public"."coronavirus_updated" WHERE suspected>0 AND deaths=0 AND confirmed=0 GROUP BY country ORDER BY sum(suspected) DESC

    Description: Coronavirus is spreading at alarming rate. In order to know which countries are newly getting the virus is important because in these countries if timely measures are taken, it could prevent any causalities. Here is a list of suspected cases with no virus resulted deaths.

    Query 5 SELECT country, sum(suspected) as "Coronavirus uncontrolled spread count and human life loss", 100*sum(suspected)/(SELECT sum((suspected)) FROM "public"."coronavirus_updated") as "Global suspected Exposure of Coronavirus in percentage" FROM "public"."coronavirus_updated" WHERE suspected>0 AND deaths=0 GROUP BY country ORDER BY sum(suspected) DESC Description: Coronavirus is getting stronger in particular countries, but how will we measure that? We can measure it by knowing the percentage of suspected patients amongst countries which still doesn’t have any Coronavirus related deaths. The following is a list.

    Data Provided by: SRK, Data Scientist at H2O.ai, Chennai, India

  10. Global market share of leading search engines 2015-2025

    • statista.com
    • abripper.com
    Updated Apr 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global market share of leading search engines 2015-2025 [Dataset]. https://www.statista.com/statistics/1381664/worldwide-all-devices-market-share-of-search-engines/
    Explore at:
    Dataset updated
    Apr 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2015 - Mar 2025
    Area covered
    Worldwide
    Description

    As of March 2025, Google continued to dominate the global search engine industry by far, with an 89.62 percent market share. However, this stronghold may be showing signs of erosion, with its share across all devices dipping to its lowest point in over two decades. Bing, Google's closest competitor, currently holds a market share of 4.01 percent across, while Russia-based Yandex hikes to the third place with a share of around 2.51 percent. Competitive landscape and regional variations While Google's overall dominance persists, other search engines carve out niches in various markets and platforms. Bing holds a 12.21 percent market share across desktop devices worldwide, as Yandex and Baidu have found success inside and outside of their home markets. Yandex is used by over 63 percent of Russian internet users, but Baidu has seen its market share significantly in China As regional variations highlight the importance of local players in challenging Google's global supremacy, the company is likely to face more challenges with the AI-powered online search trend and increasing regulatory scrutiny. Search behavior and antitrust concerns Despite facing more competition, Google remains deeply ingrained in users' online habits. In 2024, "Google" itself was the most popular search query on its own platform, followed by "YouTube" - another Google-owned property. This self-reinforcing ecosystem has drawn scrutiny from regulators, with the European Commission imposing millionaire antitrust fines on the company. As its influence extends beyond search into various online services, the company's market position continues to be a subject of debate among industry watchdogs and authorities worldwide.

  11. ukbtools: An R package to manage and query UK Biobank data

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ken B. Hanscombe; Jonathan R. I. Coleman; Matthew Traylor; Cathryn M. Lewis (2023). ukbtools: An R package to manage and query UK Biobank data [Dataset]. http://doi.org/10.1371/journal.pone.0214311
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ken B. Hanscombe; Jonathan R. I. Coleman; Matthew Traylor; Cathryn M. Lewis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThe UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names.Resultsukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata.ConclusionHaving a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.

  12. u

    Data from: Database of Ecosystem Based Projects, Programs and Evaluation...

    • research.usc.edu.au
    • researchdata.edu.au
    xls
    Updated Mar 19, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carmen Elrick-Barr; Ailbhe Travers; Robert C Kay (2014). Database of Ecosystem Based Projects, Programs and Evaluation Tools [Dataset]. https://research.usc.edu.au/esploro/outputs/dataset/Database-of-Ecosystem-Based-Projects-Programs/99448815202621
    Explore at:
    xls(171520 bytes)Available download formats
    Dataset updated
    Mar 19, 2014
    Dataset provided by
    University of the Sunshine Coast
    Authors
    Carmen Elrick-Barr; Ailbhe Travers; Robert C Kay
    Time period covered
    2011
    Description

    This MS Excel Stocktake Database contains a list of EBA tools, as sourced from online and hardcopy sources (Worksheet 'EBA Tools Database'). It also includes a number of projects that may be categoriesed as Ecosystem Based Adaptation Projects (Worksheet 'Profiling of EBA Projects') as well as a list of tools to evaluate adaptation projects (Worksheet 'Evaluation Tools'). The database was compiled as input into a project that developed a Decision Support Framework for Ecosystem Based Adaptation for the United Nations Environment Program (UNEP). The outputs presented here are interim deliverables for this project.

  13. The ecosystem of technologies for social science research, data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela Duca (2020). The ecosystem of technologies for social science research, data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3555206
    Explore at:
    Dataset updated
    Feb 4, 2020
    Dataset provided by
    Sagehttp://www.sagepublications.com/
    Authors
    Daniela Duca
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the list of 417 software tools, packages, apps and platforms we have reviewed as part of SAGE Ocean. The dataset contains a number of features: name, pitch, type of tool, country, year, some papers, funders, founders, founding team etc. The latest version of this list will be available on github along with the metadata: https://github.com/danielagduca/SAGE_tools_social_science/tree/master/data

    The supporting white paper describing the tools is:

    Duca, D., & Metzler, K. (2019). The ecosystem of technologies for social science research (White paper). London, UK: Sage. doi: 10.4135/wp191101

  14. n

    Bioinformatic Harvester IV (beta) at Karlsruhe Institute of Technology

    • neuinfo.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Bioinformatic Harvester IV (beta) at Karlsruhe Institute of Technology [Dataset]. http://identifiers.org/RRID:SCR_008017
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Harvester is a Web-based tool that bulk-collects bioinformatic data on human proteins from various databases and prediction servers. It is a meta search engine for gene and protein information. It searches 16 major databases and prediction servers and combines the results on pregenerated HTML pages. In this way Harvester can provide comprehensive gene-protein information from different servers in a convenient and fast manner. As full text meta search engine, similar to Google trade mark, Harvester allows screening of the whole genome proteome for current protein functions and predictions in a few seconds. With Harvester it is now possible to compare and check the quality of different database entries and prediction algorithms on a single page. Sponsors: This work has been supported by the BMBF with grants 01GR0101 and 01KW0013.

  15. z

    A Systematic Review of Tools for AI-Augmented Data Quality Management in...

    • zenodo.org
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anastasija Nikiforova; Anastasija Nikiforova; Heidi Carolina Tamm; Heidi Carolina Tamm (2025). A Systematic Review of Tools for AI-Augmented Data Quality Management in Data Warehouses [Dataset]. http://doi.org/10.5281/zenodo.15882760
    Explore at:
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    Zenodo
    Authors
    Anastasija Nikiforova; Anastasija Nikiforova; Heidi Carolina Tamm; Heidi Carolina Tamm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 2025
    Description

    As part of the “From Data Quality for AI to AI for Data Quality: A Systematic Review of Tools for AI-Augmented Data Quality Management in Data Warehouses” (Tamm & Nikifovora, 2025), a systematic review of DQ tools was conducted to evaluate their automation capabilities, particularly in detecting and recommending DQ rules in data warehouse - a key component of data ecosystems.

    To attain this objective, five key research questions were established.

    Q1. What is the current landscape of DQ tools?

    Q2. What functionalities do DQ tools offer?

    Q3. Which data storage systems DQ tools support? and where does the processing of the organization’s data occur?

    Q4. What methods do DQ tools use for rule detection?

    Q5. What are the advantages and disadvantages of existing solutions?

    Candidate DQ tools were identified through a combination of rankings from technology reviewers and academic sources. A Google search was conducted using keyword (“the best data quality tools” OR “the best data quality software” OR “top data quality tools” OR “top data quality software”) AND "2023" (search conducted in December 2023). Additionally, this list was complemented by DQ tools found in academic articles, identified with two queries in Scopus, namely "data quality tool" OR "data quality software" and ("information quality" OR "data quality") AND ("software" OR "tool" OR "application") AND "data quality rule". For selecting DQ tools for further systematic analysis, several exclusion criteria were applied. Tools from sponsored, outdated (pre-2023), non-English, or non-technical sources were excluded. Academic papers were restricted to those published within the last ten years, focusing on the computer science field.

    This resulted in 151 DQ tools, which are provided in the file "DQ Tools Selection".

    To structure the review process and facilitate answering the established questions (Q1-Q3), a review protocol was developed, consisting of three sections. The initial tool assessment was based on availability, functionality, and trialability (e.g., open-source, demo version, or free trial). Tools that were discontinued or lacked sufficient information were excluded. The second phase (and protocol section) focused on evaluating the functionalities of the identified tools. Initially, the core DQM functionalities were assessed, such as data profiling, custom DQ rule creation, anomaly detection, data cleansing, report generation, rule detection, data enrichment. Subsequently, additional data management functionalities such as master data management, data lineage, data cataloging, semantic discovery, and integration were considered. The final stage of the review examined the tools' compatibility with data warehouses and General Data Protection Regulation (GDPR) compliance. Tools that did not meet these criteria were excluded. As such, the 3rd section of the protocol evaluated the tool's environment and connectivity features, such as whether it operates in the cloud, hybrid, or on-premises, its API support, input data types (.txt, .csv, .xlsx, .json), and its ability to connect to data sources including relational and non-relational databases, data warehouses, cloud data storages, data lakes. Additionally, it assessed whether the tool processes data on-premises or in the vendor’s cloud environment. Tools were excluded based on criteria such as not supporting data warehouses or processing data externally.

    These protocols (filled) are available in file "DQ Tools Analysis"

  16. n

    Bio Resource for Array Genes Database

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Oct 28, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Bio Resource for Array Genes Database [Dataset]. http://identifiers.org/RRID:SCR_000748
    Explore at:
    Dataset updated
    Oct 28, 2017
    Description

    Bio Resource for array genes is a free online resource for easy access to collective and integrated information from various public biological resources for human, mouse, rat, fly and c. elegans genes. The resource includes information about the genes that are represented in Unigene clusters. This resource provides interactive tools to selectively view, analyze and interpret gene expression patterns against the background of gene and protein functional information. Different query options are provided to mine the biological relationships represented in the underlying database. Search button will take you to the list of query tools available. This Bio resource is a platform designed as an online resource to assist researchers in analyzing results of microarray experiments and developing a biological interpretation of the results. This site is mainly to interpret the unique gene expression patterns found as biological changes that can lead to new diagnostic procedures and drug targets. This interactive site allows users to selectively view a variety of information about gene functions that is stored in an underlying database. Although there are other online resources that provide a comprehensive annotation and summary of genes, this resource differs from these by further enabling researchers to mine biological relationships amongst the genes captured in the database using new query tools. Thus providing a unique way of interpreting the microarray data results based on the knowledge provided for the cellular roles of genes and proteins. A total of six different query tools are provided and each offer different search features, analysis options and different forms of display and visualization of data. The data is collected in relational database from public resources: Unigene, Locus link, OMIM, NCBI dbEST, protein domains from NCBI CDD, Gene Ontology, Pathways (Kegg, Genmapp and Biocarta) and BIND (Protein interactions). Data is dynamically collected and compiled twice a week from public databases. Search options offer capability to organize and cluster genes based on their Interactions in biological pathways, their association with Gene Ontology terms, Tissue/organ specific expression or any other user-chosen functional grouping of genes. A color coding scheme is used to highlight differential gene expression patterns against a background of gene functional information. Concept hierarchies (Anatomy and Diseases) of MESH (Medical Subject Heading) terms are used to organize and display the data related to Tissue specific expression and Diseases. Sponsors: BioRag database is maintained by the Bioinformatics group at Arizona Cancer Center. The material presented here is compiled from different public databases. BioRag is hosted by the Biotechnology Computing Facility of the University of Arizona. 2002,2003 University of Arizona.

  17. f

    Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Oct 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Triant, Deborah A.; Andorf, Carson M.; Gardiner, Jack M.; Unni, Deepak R.; Elsik, Christine G.; Nguyen, Hung N.; Le Tourneau, Justin J.; Tayal, Aditi; Walsh, Amy T.; Portwood, John L.; Cannon, Ethalinda K. S.; Shamimuzzaman, (2020). Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and Genomics Database.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000484626
    Explore at:
    Dataset updated
    Oct 22, 2020
    Authors
    Triant, Deborah A.; Andorf, Carson M.; Gardiner, Jack M.; Unni, Deepak R.; Elsik, Christine G.; Nguyen, Hung N.; Le Tourneau, Justin J.; Tayal, Aditi; Walsh, Amy T.; Portwood, John L.; Cannon, Ethalinda K. S.; Shamimuzzaman,
    Description

    MaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis.

  18. c

    Corporations Search (Washington state)

    • s.cnmilf.com
    • data.wa.gov
    • +1more
    Updated Sep 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.wa.gov (2024). Corporations Search (Washington state) [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/corporations-search-from-secretary-of-state
    Explore at:
    Dataset updated
    Sep 6, 2024
    Dataset provided by
    data.wa.gov
    Area covered
    Washington
    Description

    This provides a link to the Washington Secretary of State's Corporations Search tool. The Corporations Data Extract feature is no longer available. Customers needing a list of multiple businesses can use our advanced search to create a list of businesses under specific parameters. You can export this information to an Excel spreadsheet to sort and search more extensively. Below are the steps to perform this type of search. The more specified parameter searches provide narrower search results. Please visit our Corporations and Charities Filing System by following this link https://ccfs.sos.wa.gov/ Scroll down to the “Corporation Search” section and click the “Advanced Search” button on the right. Under the first section, specify how you would like the business name searched. Only use this for single business lookups unless all the businesses you are searching have a common name (use the “contains” selection). Select the appropriate business type from the dropdown if you are looking for a list of a specific business type. For a list of a particular business type with a specific status, select that status under “Business Status.” You can also search by expiration date in this section. Under the “Date of Incorporation/Formation/Registration,” you can search by start or end date. Under the “Registered Agent/Governor Search” section, you can search all businesses with the same registered agent on record or governor listed. Once you have made all your search selections, click the green “Search” button at the bottom right of the page. A list will populate; scroll to the bottom and select the green Excel document icon with CSV. An Excel document should automatically download. If you have popups blocked, please unblock our site, and try again. Once you have opened the downloaded Excel spreadsheet, you can adjust the width of each column and sort the data using the data tab. You can also search by pressing CTRL+F on a Windows keyboard.

  19. w

    Data Use in Academia Dataset

    • datacatalog.worldbank.org
    csv, utf-8
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semantic Scholar Open Research Corpus (S2ORC) (2023). Data Use in Academia Dataset [Dataset]. https://datacatalog.worldbank.org/search/dataset/0065200/data_use_in_academia_dataset
    Explore at:
    utf-8, csvAvailable download formats
    Dataset updated
    Nov 27, 2023
    Dataset provided by
    Semantic Scholar Open Research Corpus (S2ORC)
    Brian William Stacy
    License

    https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc

    Description

    This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.


    Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.


    We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.


    Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.


    The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.


    To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.


    The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.


    The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:


    Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.

    There are two classification tasks in this exercise:

    1. identifying whether an academic article is using data from any country

    2. Identifying from which country that data came.

    For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.

    After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]

    For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.

    We expect between 10 and 35 percent of all articles to use data.


    The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.


    A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.


    The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.


    The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of

  20. d

    Cost Management (Activity-Based) - Raw Source Data

    • search.dataone.org
    • datasetcatalog.nlm.nih.gov
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anez, Diomar; Anez, Dimar (2025). Cost Management (Activity-Based) - Raw Source Data [Dataset]. http://doi.org/10.7910/DVN/8GJH2G
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Anez, Diomar; Anez, Dimar
    Description

    This dataset contains raw, unprocessed data files pertaining to the management tool group focused on 'Activity-Based Costing' (ABC) and 'Activity-Based Management' (ABM). The data originates from five distinct sources, each reflecting different facets of the tool's prominence and usage over time. Files preserve the original metrics and temporal granularity before any comparative normalization or harmonization. Data Sources & File Details: Google Trends File (Prefix: GT_): Metric: Relative Search Interest (RSI) Index (0-100 scale). Keywords Used: "activity based costing" + "activity based management" + "activity based costing management" Time Period: January 2004 - January 2025 (Native Monthly Resolution). Scope: Global Web Search, broad categorization. Extraction Date: Data extracted January 2025. Notes: Index relative to peak interest within the period for these terms. Reflects public/professional search interest trends. Based on probabilistic sampling. Source URL: Google Trends Query Google Books Ngram Viewer File (Prefix: GB_): Metric: Annual Relative Frequency (% of total n-grams in the corpus). Keywords Used: Activity Based Management + Activity Based Costing Time Period: 1950 - 2022 (Annual Resolution). Corpus: English. Parameters: Case Insensitive OFF, Smoothing 0. Extraction Date: Data extracted January 2025. Notes: Reflects term usage frequency in Google's digitized book corpus. Subject to corpus limitations (English bias, coverage). Source URL: Ngram Viewer Query Crossref.org File (Prefix: CR_): Metric: Absolute count of publications per month matching keywords. Keywords Used: ("activity based costing" OR "activity based management") AND ("management" OR "accounting" OR "cost control" OR "financial" OR "analysis" OR "system") Time Period: 1950 - 2025 (Queried for monthly counts based on publication date metadata). Search Fields: Title, Abstract. Extraction Date: Data extracted January 2025. Notes: Reflects volume of relevant academic publications indexed by Crossref. Deduplicated using DOIs; records without DOIs omitted. Source URL: Crossref Search Query Bain & Co. Survey - Usability File (Prefix: BU_): Metric: Original Percentage (%) of executives reporting tool usage. Tool Names/Years Included: Activity-Based Costing (1993); Activity-Based Management (1999, 2000, 2002, 2004). (Note: Some sources use Activity Based Management). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 1994, 2001, 2003, 2005). Note: Tool potentially not surveyed or reported after 2004 under these specific names. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1993/500; 1999/475; 2000/214; 2002/708; 2004/960. Bain & Co. Survey - Satisfaction File (Prefix: BS_): Metric: Original Average Satisfaction Score (Scale 0-5). Tool Names/Years Included: Activity-Based Costing (1993); Activity-Based Management (1999, 2000, 2002, 2004). (Note: Some sources use Activity Based Management). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 1994, 2001, 2003, 2005). Note: Tool potentially not surveyed or reported after 2004 under these specific names. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1993/500; 1999/475; 2000/214; 2002/708; 2004/960. Reflects subjective executive perception of utility. File Naming Convention: Files generally follow the pattern: PREFIX_Tool.csv, where the PREFIX indicates the data source: GT_: Google Trends GB_: Google Books Ngram CR_: Crossref.org (Count Data for this Raw Dataset) BU_: Bain & Company Survey (Usability) BS_: Bain & Company Survey (Satisfaction) The essential identification comes from the PREFIX and the Tool Name segment. This dataset resides within the 'Management Tool Source Data (Raw Extracts)' Dataverse.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Elias Dabbas (2019). Recipes Search Engine Results Data [Dataset]. https://www.kaggle.com/datasets/eliasdabbas/recipes-search-engine-results-data
Organization logo

Recipes Search Engine Results Data

Recipe keyword positions on Google and YouTube

Explore at:
zip(6875244 bytes)Available download formats
Dataset updated
Mar 30, 2019
Authors
Elias Dabbas
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Recipe keywords' positions on search; Google and YouTube.
These datasets can be interesting for SEO research for the recipes industry.

Content

243 national recipes (based on Wikipedia's national dish list)
2 keyword versions dish recipe and how to make dish
Total 486 queries (10 results each)

Google: 4,860 rows (defaults to 10 per result, and some missing)
YouTube: 1,455 rows (defaults to 5 per result, and some missing)

Acknowledgements

Google CSE API, YouTube API, Python, requests, pandas, advertools.

Inspiration

It's interesting to know about how things are visible from a search engine perspective, and compare Google and YouTube as well.
National dishes are mostly delicious as well!

Search
Clear search
Close search
Google apps
Main menu