100+ datasets found
  1. G

    AI Dataset Search Platform Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). AI Dataset Search Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ai-dataset-search-platform-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Aug 21, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI Dataset Search Platform Market Outlook



    According to our latest research, the global AI Dataset Search Platform market size is valued at USD 1.18 billion in 2024, with a robust year-over-year expansion driven by the escalating demand for high-quality datasets to fuel artificial intelligence and machine learning initiatives across industries. The market is expected to grow at a CAGR of 22.6% from 2025 to 2033, reaching an estimated USD 9.62 billion by 2033. This exponential growth is primarily attributed to the increasing recognition of data as a strategic asset, the proliferation of AI applications across sectors, and the need for efficient, scalable, and secure platforms to discover, curate, and manage diverse datasets.



    One of the primary growth factors propelling the AI Dataset Search Platform market is the exponential surge in AI adoption across both public and private sectors. Businesses and institutions are increasingly leveraging AI to gain competitive advantages, enhance operational efficiencies, and deliver personalized experiences. However, the effectiveness of AI models is fundamentally reliant on the quality and diversity of training datasets. As organizations strive to accelerate their AI initiatives, the need for platforms that can efficiently search, aggregate, and validate datasets from disparate sources has become paramount. This has led to a significant uptick in investments in AI dataset search platforms, as they enable faster data discovery, reduce development cycles, and ensure compliance with data governance standards.



    Another key driver for the market is the growing complexity and volume of data generated from emerging technologies such as IoT, edge computing, and connected devices. The sheer scale and heterogeneity of data sources necessitate advanced search platforms equipped with intelligent indexing, semantic search, and metadata management capabilities. These platforms not only facilitate the identification of relevant datasets but also support data annotation, labeling, and preprocessing, which are critical for building robust AI models. Furthermore, the integration of AI-powered search algorithms within these platforms enhances the accuracy and relevance of search results, thereby improving the overall efficiency of data scientists and AI practitioners.



    Additionally, regulatory pressures and the increasing emphasis on ethical AI have underscored the importance of transparent and auditable data sourcing. Organizations are compelled to demonstrate the provenance and integrity of the datasets used in their AI models to mitigate risks related to bias, privacy, and compliance. AI dataset search platforms address these challenges by providing traceability, version control, and access management features, ensuring that only authorized and compliant datasets are utilized. This not only reduces legal and reputational risks but also fosters trust among stakeholders, further accelerating market adoption.



    From a regional perspective, North America dominates the AI Dataset Search Platform market in 2024, accounting for over 38% of the global revenue. This leadership is driven by the presence of major technology providers, a mature AI ecosystem, and substantial investments in research and development. Europe follows closely, benefiting from stringent data privacy regulations and strong government support for AI innovation. The Asia Pacific region is experiencing the fastest growth, propelled by rapid digital transformation, expanding AI research communities, and increasing government initiatives to foster AI adoption. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as organizations in these regions gradually embrace AI-driven solutions.





    Component Analysis



    The AI Dataset Search Platform market by component is segmented into platforms and services, each playing a pivotal role in the ecosystem. The platform segment encompasses the core software infrastructure that enables users to search, index, curate, and manage datasets. This segmen

  2. Z

    Data for study "Direct Answers in Google Search Results"

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Strzelecki, Artur; Rutecka, Paulina (2020). Data for study "Direct Answers in Google Search Results" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3541091
    Explore at:
    Dataset updated
    Jun 9, 2020
    Dataset provided by
    University of Economics in Katowice
    Authors
    Strzelecki, Artur; Rutecka, Paulina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The goal of this research is to examine direct answers in Google web search engine. Dataset was collected using Senuto (https://www.senuto.com/). Senuto is as an online tool, that extracts data on websites visibility from Google search engine.

    Dataset contains the following elements:

    keyword,

    number of monthly searches,

    featured domain,

    featured main domain,

    featured position,

    featured type,

    featured url,

    content,

    content length.

    Dataset with visibility structure has 743 798 keywords that were resulting in SERPs with direct answer.

  3. Google Trends

    • kaggle.com
    zip
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammed Tausif (2023). Google Trends [Dataset]. https://www.kaggle.com/datasets/muhammedtausif/data-science-trends-on-google
    Explore at:
    zip(160052 bytes)Available download formats
    Dataset updated
    Jun 23, 2023
    Authors
    Muhammed Tausif
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is taken from Google Trend. It shows the trend of "Data Science" search term on Google Search Engine and YouTube from 2004 to 2022 (April). There will be an update soon.

  4. B

    Belarus Internet Usage: Search Engine Market Share: Desktop: StartPagina...

    • ceicdata.com
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Belarus Internet Usage: Search Engine Market Share: Desktop: StartPagina (Google) [Dataset]. https://www.ceicdata.com/en/belarus/internet-usage-search-engine-market-share/internet-usage-search-engine-market-share-desktop-startpagina-google
    Explore at:
    Dataset updated
    Mar 9, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 1, 2025 - Mar 9, 2025
    Area covered
    Belarus
    Description

    Belarus Internet Usage: Search Engine Market Share: Desktop: StartPagina (Google) data was reported at 0.000 % in 09 Mar 2025. This records a decrease from the previous number of 0.030 % for 08 Mar 2025. Belarus Internet Usage: Search Engine Market Share: Desktop: StartPagina (Google) data is updated daily, averaging 0.070 % from Mar 2025 (Median) to 09 Mar 2025, with 9 observations. The data reached an all-time high of 0.070 % in 05 Mar 2025 and a record low of 0.000 % in 09 Mar 2025. Belarus Internet Usage: Search Engine Market Share: Desktop: StartPagina (Google) data remains active status in CEIC and is reported by Statcounter Global Stats. The data is categorized under Global Database’s Belarus – Table BY.SC.IU: Internet Usage: Search Engine Market Share.

  5. m

    PredSearch | Web Search Data, Keyword Data, Online Search Trends Data |...

    • avance-online-sl.mydatastorefront.com
    Updated Jun 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Predsearch (2024). PredSearch | Web Search Data, Keyword Data, Online Search Trends Data | Amazon, Google, TikTok - 2 years history | Global coverage | +500k/w keywords [Dataset]. https://avance-online-sl.mydatastorefront.com/products/predsearch-web-search-data-us-amazon-google-tiktok-predsearch
    Explore at:
    Dataset updated
    Jun 23, 2024
    Dataset authored and provided by
    Predsearch
    Area covered
    Netherlands, Sweden, Australia, Mexico, Japan, Italy, Spain, United States, France, Germany
    Description

    Ranked by Keyword the Web Search Data consists of:

    • 25+ consumer categories
    • Insights from Top Brands, Top Products, Click Share, Conversion Share, Product Competitors per Search Term and Technical Product Specifications
    • 2+ years of historical coverage
    • 13+ markets
  6. D

    Data from: Semantic Query Analysis from the Global Science Gateway

    • ssh.datastations.nl
    • datasearch.gesis.org
    bin, pdf, zip
    Updated Feb 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    C. Carlesi; C. Carlesi (2018). Semantic Query Analysis from the Global Science Gateway [Dataset]. http://doi.org/10.17026/DANS-25M-FHE2
    Explore at:
    pdf(14994765), zip(19837), bin(19672036), pdf(1349455), pdf(1431355)Available download formats
    Dataset updated
    Feb 8, 2018
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    C. Carlesi; C. Carlesi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Nowadays web portals play an essential role in searching and retrieving information in the several fields of knowledge: they are ever more technologically advanced and designed for supporting the storage of a huge amount of information in natural language originating from the queries launched by users worldwide.A good example is given by the WorldWideScience search engine:The database is available at . It is based on a similar gateway, Science.gov, which is the major path to U.S. government science information, as it pulls together Web-based resources from various agencies. The information in the database is intended to be of high quality and authority, as well as the most current available from the participating countries in the Alliance, so users will find that the results will be more refined than those from a general search of Google. It covers the fields of medicine, agriculture, the environment, and energy, as well as basic sciences. Most of the information may be obtained free of charge (the database itself may be used free of charge) and is considered ‘‘open domain.’’ As of this writing, there are about 60 countries participating in WorldWideScience.org, providing access to 50+databases and information portals. Not all content is in English. (Bronson, 2009)Given this scenario, we focused on building a corpus constituted by the query logs registered by the GreyGuide: Repository and Portal to Good Practices and Resources in Grey Literature and received by the WorldWideScience.org (The Global Science Gateway) portal: the aim is to retrieve information related to social media which as of today represent a considerable source of data more and more widely used for research ends.This project includes eight months of query logs registered between July 2017 and February 2018 for a total of 445,827 queries. The analysis mainly concentrates on the semantics of the queries received from the portal clients: it is a process of information retrieval from a rich digital catalogue whose language is dynamic, is evolving and follows – as well as reflects – the cultural changes of our modern society.

  7. Y

    Yemen Google Search Trends: Economic Measures: Mortgage Loan

    • ceicdata.com
    Updated May 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2023). Yemen Google Search Trends: Economic Measures: Mortgage Loan [Dataset]. https://www.ceicdata.com/en/yemen/google-search-trends-by-categories
    Explore at:
    Dataset updated
    May 3, 2023
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 18, 2025 - Nov 29, 2025
    Area covered
    Yemen
    Description

    Google Search Trends: Economic Measures: Mortgage Loan data was reported at 0.000 Score in 29 Nov 2025. This stayed constant from the previous number of 0.000 Score for 28 Nov 2025. Google Search Trends: Economic Measures: Mortgage Loan data is updated daily, averaging 0.000 Score from Dec 2021 (Median) to 29 Nov 2025, with 1460 observations. The data reached an all-time high of 100.000 Score in 23 Jan 2022 and a record low of 0.000 Score in 29 Nov 2025. Google Search Trends: Economic Measures: Mortgage Loan data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s Yemen – Table YE.Google.GT: Google Search Trends: by Categories.

  8. Google SERP(search engine result) /SEO search Data

    • kaggle.com
    zip
    Updated Apr 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BarkingData (2022). Google SERP(search engine result) /SEO search Data [Dataset]. https://www.kaggle.com/datasets/polartech/google-serpsearch-engine-result-seo-search-data
    Explore at:
    zip(6565970 bytes)Available download formats
    Dataset updated
    Apr 23, 2022
    Authors
    BarkingData
    Description

    Context One of the important tasks in SEO analysis, is to check rankings and product listings ads on search engines. This dataset contains Google serp (search engine result pages) for 500+ keywords related to pet food,funiture, clothing and a lot more, for both pc and mobile platforms.

    Content 500+ keywords searched from 2 locations: san francisco and NYC United State Data includes organic search results, map results, PLA (product listing ads), top ads, bottom ads, merchant domains etc.

    Contact info@barkingdata.com if you are interested to build similar types of SEO/SERP datasets. We specialize in web mining and web data harvesting from the world wide web (including mobile apps), we have built 5000+ datasets for researchers, analysts, scholars , retailers, ... Learn more from https://www.barkingdata.com

  9. Job Offers Web Scraping Search

    • kaggle.com
    zip
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search
    Explore at:
    zip(5322 bytes)Available download formats
    Dataset updated
    Feb 11, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Job Offers Web Scraping Search

    Targeted Results to Find the Optimal Work Solution

    By [source]

    About this dataset

    This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

    • Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

    • Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

    • Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

    • Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

      All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

    Research Ideas

    • Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.
    • The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.
    • It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

  10. d

    Datasys | Clickstream Data | Keyword Sets (200M+ daily searches | global...

    • datarade.ai
    .json
    Updated May 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasys (2022). Datasys | Clickstream Data | Keyword Sets (200M+ daily searches | global coverage) [Dataset]. https://datarade.ai/data-products/datasys-clickstream-data-keyword-sets-200m-daily-search-datasys
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    May 12, 2022
    Dataset authored and provided by
    Datasys
    Area covered
    Saint Vincent and the Grenadines, Bermuda, Grenada, Saint Kitts and Nevis, Palestine, Vietnam, Belize, Guadeloupe, Saint Pierre and Miquelon, Myanmar
    Description

    Datasys Keyword Sets provide search activity datasets at scale, capturing the exact terms consumers use across industries. This data reveals category interest, trending keywords, and search frequency, supporting SEO strategy, competitive benchmarking, and campaign targeting. Updated daily for real-time consumer insights.

  11. Y

    Yemen Google Search Trends: Online Movie: Pornhub

    • ceicdata.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Yemen Google Search Trends: Online Movie: Pornhub [Dataset]. https://www.ceicdata.com/en/yemen/google-search-trends-by-categories/google-search-trends-online-movie-pornhub
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 17, 2025 - Nov 28, 2025
    Area covered
    Yemen
    Description

    Yemen Google Search Trends: Online Movie: Pornhub data was reported at 9.000 Score in 28 Nov 2025. This records an increase from the previous number of 8.000 Score for 27 Nov 2025. Yemen Google Search Trends: Online Movie: Pornhub data is updated daily, averaging 0.000 Score from Dec 2021 (Median) to 28 Nov 2025, with 1459 observations. The data reached an all-time high of 57.000 Score in 09 Feb 2022 and a record low of 0.000 Score in 16 Nov 2025. Yemen Google Search Trends: Online Movie: Pornhub data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s Yemen – Table YE.Google.GT: Google Search Trends: by Categories.

  12. Data from: Search and Harvesting across NFDI Consortia - Gaps and Challenges...

    • meta4ds.fokus.fraunhofer.de
    pdf, unknown
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2023). Search and Harvesting across NFDI Consortia - Gaps and Challenges [Dataset]. https://meta4ds.fokus.fraunhofer.de/datasets/oai-zenodo-org-8426850?locale=en
    Explore at:
    pdf(616450), unknownAvailable download formats
    Dataset updated
    Oct 10, 2023
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Search and harvesting use cases on harmonized metadata play an important role in several NFDI consortia (see Meta(data), Terminology and Provenance section concept). The working group Search and Harvesting works on a common understanding of user requirements (for search) and service requirements (for harvesting), analysis of the data sources landscape, and recommendations - with respect to common and specific needs, e.g., for spatial or sensitive data. On this poster, we present as our first outcome an overview on identified and structured search and harvesting gaps and challenges across NFDI consortia, which fosters a common understanding of a multidisciplinary vision for search & harvesting solutions.

  13. i

    Germany Real-time Search Trends Data

    • highfrequency.it.com
    json
    Updated Nov 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    High Frequency Words (2025). Germany Real-time Search Trends Data [Dataset]. https://highfrequency.it.com/de
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 18, 2025
    Dataset provided by
    High Frequency Words
    Time period covered
    Nov 18, 2025
    Area covered
    Germany
    Description

    Minute-by-minute updated keyword database from Google, featuring 250 trending search terms

  14. V

    Vector Database Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Vector Database Software Report [Dataset]. https://www.datainsightsmarket.com/reports/vector-database-software-529421
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Sep 20, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Vector Database Software market is poised for substantial growth, projected to reach an estimated $XXX million in 2025, with an impressive Compound Annual Growth Rate (CAGR) of XX% during the forecast period of 2025-2033. This rapid expansion is fueled by the increasing adoption of AI and machine learning across industries, necessitating efficient storage and retrieval of unstructured data like images, audio, and text. The burgeoning demand for enhanced search capabilities, personalized recommendations, and advanced anomaly detection is driving the market forward. Key market drivers include the widespread implementation of large language models (LLMs), the growing need for semantic search functionalities, and the continuous innovation in AI-powered applications. The market is segmenting into applications catering to both Small and Medium-sized Enterprises (SMEs) and Large Enterprises, with a clear shift towards Cloud-based solutions owing to their scalability, cost-effectiveness, and ease of deployment. The vector database landscape is characterized by dynamic innovation and fierce competition, with prominent players like Pinecone, Weaviate, Supabase, and Zilliz Cloud leading the charge. Emerging trends such as the development of hybrid search capabilities, integration with existing data infrastructure, and enhanced security features are shaping the market's trajectory. While the market shows immense promise, certain restraints, including the complexity of data integration and the need for specialized technical expertise, may pose challenges. Geographically, North America is expected to dominate the market share due to its early adoption of AI technologies and robust R&D investments, followed closely by Asia Pacific, which is witnessing rapid digital transformation and a surge in AI startups. Europe and other emerging regions are also anticipated to contribute significantly to market growth as AI adoption becomes more widespread. This report delves into the rapidly evolving Vector Database Software Market, providing a detailed analysis of its landscape from 2019 to 2033. With a Base Year of 2025, the report offers crucial insights for the Estimated Year of 2025 and projects market dynamics through the Forecast Period of 2025-2033, building upon the Historical Period of 2019-2024. The global vector database software market is poised for significant expansion, with an estimated market size projected to reach hundreds of millions of dollars by 2025, and anticipated to grow exponentially in the coming years. This growth is fueled by the increasing adoption of AI and machine learning across various industries, necessitating efficient storage and retrieval of high-dimensional vector data.

  15. Efficient Keyword-Based Search for Top-K Cells in Text Cube - Dataset - NASA...

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Efficient Keyword-Based Search for Top-K Cells in Text Cube - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/efficient-keyword-based-search-for-top-k-cells-in-text-cube
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for coring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches, inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches. Citation: B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. N. Srivastava, and N. C. Oza, “Efficient Keyword-Based Search for Top-K Cells in Text Cube,” IEEE Transactions on Knowledge and Data Engineering, 2011.

  16. D

    Data for "Prediction of Search Targets From Fixations in Open-World...

    • darus.uni-stuttgart.de
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Bulling (2022). Data for "Prediction of Search Targets From Fixations in Open-World Settings" [Dataset]. http://doi.org/10.18419/DARUS-3226
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 28, 2022
    Dataset provided by
    DaRUS
    Authors
    Andreas Bulling
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    World
    Dataset funded by
    DFG
    Cluster of Excellence on Multimodal Computing and Interaction (MMCI) at Saarland University
    Description

    We designed a human study to collect fixation data during visual search. We opted for a task that involved searching for a single image (the target) within a synthesised collage of images (the search set). Each of the collages are the random permutation of a finite set of images. To explore the impact of the similarity in appearance between target and search set on both fixation behaviour and automatic inference, we have created three different search tasks covering a range of similarities. In prior work, colour was found to be a particularly important cue for guiding search to targets and target-similar objects. Therefore we have selected for the first task 78 coloured O'Reilly book covers to compose the collages. These covers show a woodcut of an animal at the top and the title of the book in a characteristic font underneath. Given that overall cover appearance was very similar, this task allows us to analyse fixation behaviour when colour is the most discriminative feature. For the second task we use a set of 84 book covers from Amazon. In contrast to the first task, appearance of these covers is more diverse. This makes it possible to analyse fixation behaviour when both structure and colour information could be used by participants to find the target. Finally, for the third task, we use a set of 78 mugshots from a public database of suspects. In contrast to the other tasks, we transformed the mugshots to grey-scale so that they did not contain any colour information. In this case, allows abalysis of fixation behaviour when colour information was not available at all. We found faces to be particularly interesting given the relevance of searching for faces in many practical applications. 18 participants (9 males), age 18-30 Gaze data recorded with a stationary Tobii TX300 eye tracker More information about the dataset can be found in the README file.

  17. Livestock and Grain Market News Search

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Marketing Service, Department of Agriculture (2025). Livestock and Grain Market News Search [Dataset]. https://catalog.data.gov/dataset/livestock-and-grain-market-news-search
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Marketing Servicehttps://www.ams.usda.gov/
    Description

    The primary function of the Livestock and Grain Market News Division of the Livestock and Seed Program (LSP) is to compile and disseminate information that will aid producers, consumers, and distributors in the sale and purchase of livestock, meat, grain, and their related products nationally and internationally.

  18. Z

    Metadata and index on social science research data

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Krämer (2020). Metadata and index on social science research data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_896430
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    gesis
    Authors
    Thomas Krämer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains metadata on social science research data.

    It is a snapshot of the metadata used for the creation of the search index for http://datasearch.gesis.org.

    Metadata-DC.20170713.tar.gz contains the raw XML Dublin Core metadata harvested via OAI-PMH.

    elasticsearch-2.4.4.-snapshot-20170713.tar.gz contains a corresponding elasticsearch v.2.4.4 index.

  19. U

    Benchmark for Relational Keyword Search

    • dataverse.lib.virginia.edu
    bz2
    Updated Nov 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joel Coffman; Joel Coffman; Alfred C. Weaver; Alfred C. Weaver (2017). Benchmark for Relational Keyword Search [Dataset]. http://doi.org/10.18130/V3/KEVCF8
    Explore at:
    bz2(1326), bz2(5351), bz2(162170), bz2(2529), bz2(2113), bz2(38195824), bz2(237979371), bz2(2764), bz2(8495), bz2(4969), bz2(8652), bz2(9058), bz2(118471416), bz2(588132446)Available download formats
    Dataset updated
    Nov 21, 2017
    Dataset provided by
    University of Virginia Dataverse
    Authors
    Joel Coffman; Joel Coffman; Alfred C. Weaver; Alfred C. Weaver
    License

    https://dataverse.lib.virginia.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.18130/V3/KEVCF8https://dataverse.lib.virginia.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.18130/V3/KEVCF8

    Description

    The benchmark for relational keyword search is a collection of data sets, queries, and relevance assessments designed to facilitate the evaluation of systems supporting keyword search in databases. The benchmark includes three separate data sets with fifty information needs (i.e., queries) for each data set and follows the traditional approach to evaluate keyword search systems (i.e., ad hoc retrieval) developed by the information retrieval (IR) research community.

  20. f

    Significant cross-correlation coefficients of individual and averaged...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ulrich S. Tran; Rita Andel; Thomas Niederkrotenthaler; Benedikt Till; Vladeta Ajdacic-Gross; Martin Voracek (2023). Significant cross-correlation coefficients of individual and averaged time-series data. [Dataset]. http://doi.org/10.1371/journal.pone.0183149.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ulrich S. Tran; Rita Andel; Thomas Niederkrotenthaler; Benedikt Till; Vladeta Ajdacic-Gross; Martin Voracek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Significant cross-correlation coefficients of individual and averaged time-series data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Growth Market Reports (2025). AI Dataset Search Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ai-dataset-search-platform-market

AI Dataset Search Platform Market Research Report 2033

Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Aug 21, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description

AI Dataset Search Platform Market Outlook



According to our latest research, the global AI Dataset Search Platform market size is valued at USD 1.18 billion in 2024, with a robust year-over-year expansion driven by the escalating demand for high-quality datasets to fuel artificial intelligence and machine learning initiatives across industries. The market is expected to grow at a CAGR of 22.6% from 2025 to 2033, reaching an estimated USD 9.62 billion by 2033. This exponential growth is primarily attributed to the increasing recognition of data as a strategic asset, the proliferation of AI applications across sectors, and the need for efficient, scalable, and secure platforms to discover, curate, and manage diverse datasets.



One of the primary growth factors propelling the AI Dataset Search Platform market is the exponential surge in AI adoption across both public and private sectors. Businesses and institutions are increasingly leveraging AI to gain competitive advantages, enhance operational efficiencies, and deliver personalized experiences. However, the effectiveness of AI models is fundamentally reliant on the quality and diversity of training datasets. As organizations strive to accelerate their AI initiatives, the need for platforms that can efficiently search, aggregate, and validate datasets from disparate sources has become paramount. This has led to a significant uptick in investments in AI dataset search platforms, as they enable faster data discovery, reduce development cycles, and ensure compliance with data governance standards.



Another key driver for the market is the growing complexity and volume of data generated from emerging technologies such as IoT, edge computing, and connected devices. The sheer scale and heterogeneity of data sources necessitate advanced search platforms equipped with intelligent indexing, semantic search, and metadata management capabilities. These platforms not only facilitate the identification of relevant datasets but also support data annotation, labeling, and preprocessing, which are critical for building robust AI models. Furthermore, the integration of AI-powered search algorithms within these platforms enhances the accuracy and relevance of search results, thereby improving the overall efficiency of data scientists and AI practitioners.



Additionally, regulatory pressures and the increasing emphasis on ethical AI have underscored the importance of transparent and auditable data sourcing. Organizations are compelled to demonstrate the provenance and integrity of the datasets used in their AI models to mitigate risks related to bias, privacy, and compliance. AI dataset search platforms address these challenges by providing traceability, version control, and access management features, ensuring that only authorized and compliant datasets are utilized. This not only reduces legal and reputational risks but also fosters trust among stakeholders, further accelerating market adoption.



From a regional perspective, North America dominates the AI Dataset Search Platform market in 2024, accounting for over 38% of the global revenue. This leadership is driven by the presence of major technology providers, a mature AI ecosystem, and substantial investments in research and development. Europe follows closely, benefiting from stringent data privacy regulations and strong government support for AI innovation. The Asia Pacific region is experiencing the fastest growth, propelled by rapid digital transformation, expanding AI research communities, and increasing government initiatives to foster AI adoption. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as organizations in these regions gradually embrace AI-driven solutions.





Component Analysis



The AI Dataset Search Platform market by component is segmented into platforms and services, each playing a pivotal role in the ecosystem. The platform segment encompasses the core software infrastructure that enables users to search, index, curate, and manage datasets. This segmen

Search
Clear search
Close search
Google apps
Main menu