62 datasets found
  1. DataForSEO Google Full (Keywords+SERP) database, historical data available

    • datarade.ai
    .json, .csv
    Updated Aug 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataForSEO (2023). DataForSEO Google Full (Keywords+SERP) database, historical data available [Dataset]. https://datarade.ai/data-products/dataforseo-google-full-keywords-serp-database-historical-d-dataforseo
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Aug 17, 2023
    Dataset provided by
    Authors
    DataForSEO
    Area covered
    Burkina Faso, Portugal, United Kingdom, CĂ´te d'Ivoire, Cyprus, Paraguay, Sweden, Costa Rica, South Africa, Bolivia (Plurinational State of)
    Description

    You can check the fields description in the documentation: current Full database: https://docs.dataforseo.com/v3/databases/google/full/?bash; Historical Full database: https://docs.dataforseo.com/v3/databases/google/history/full/?bash.

    Full Google Database is a combination of the Advanced Google SERP Database and Google Keyword Database.

    Google SERP Database offers millions of SERPs collected in 67 regions with most of Google’s advanced SERP features, including featured snippets, knowledge graphs, people also ask sections, top stories, and more.

    Google Keyword Database encompasses billions of search terms enriched with related Google Ads data: search volume trends, CPC, competition, and more.

    This database is available in JSON format only.

    You don’t have to download fresh data dumps in JSON – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.

  2. i

    Evolution of Web search engine interfaces through SERP screenshots and HTML...

    • rdm.inesctec.pt
    Updated Jul 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Evolution of Web search engine interfaces through SERP screenshots and HTML complete pages for 20 years - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2021-003
    Explore at:
    Dataset updated
    Jul 26, 2021
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset was extracted for a study on the evolution of Web search engine interfaces since their appearance. The well-known list of “10 blue links” has evolved into richer interfaces, often personalized to the search query, the user, and other aspects. We used the most searched queries by year to extract a representative sample of SERP from the Internet Archive. The Internet Archive has been keeping snapshots and the respective HTML version of webpages over time and tts collection contains more than 50 billion webpages. We used Python and Selenium Webdriver, for browser automation, to visit each capture online, check if the capture is valid, save the HTML version, and generate a full screenshot. The dataset contains all the extracted captures. Each capture is represented by a screenshot, an HTML file, and a files' folder. We concatenate the initial of the search engine (G) with the capture's timestamp for file naming. The filename ends with a sequential integer "-N" if the timestamp is repeated. For example, "G20070330145203-1" identifies a second capture from Google by March 30, 2007. The first is identified by "G20070330145203". Using this dataset, we analyzed how SERP evolved in terms of content, layout, design (e.g., color scheme, text styling, graphics), navigation, and file size. We have registered the appearance of SERP features and analyzed the design patterns involved in each SERP component. We found that the number of elements in SERP has been rising over the years, demanding a more extensive interface area and larger files. This systematic analysis portrays evolution trends in search engine user interfaces and, more generally, web design. We expect this work will trigger other, more specific studies that can take advantage of the dataset we provide here. This graphic represents the diversity of captures by year and search engine (Google and Bing).

  3. Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Haak; Fabian Haak; Philipp Schaer; Philipp Schaer (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. http://doi.org/10.5281/zenodo.7682915
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fabian Haak; Fabian Haak; Philipp Schaer; Philipp Schaer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

    Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

    Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

    The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles.
    Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

    To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

    Dataset 2: Search Query Suggestions (suggestions.csv)

    The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

    The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

    We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

    AllSides Scraper

    At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

    We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.

  4. Google Trends and Wikipedia Page Views

    • zenodo.org
    • explore.openaire.eu
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitsuo Yoshida; Mitsuo Yoshida (2020). Google Trends and Wikipedia Page Views [Dataset]. http://doi.org/10.5281/zenodo.14539
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mitsuo Yoshida; Mitsuo Yoshida
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Abstract (our paper)

    The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends.

    Data

    personal-name.txt.gz:
    The first column is the Wikipedia article id, the second column is the search keyword, the third column is the Wikipedia article title, and the fourth column is the total of page views from 2008 to 2014.

    personal-name_data_google-trends.txt.gz, personal-name_data_wikipedia.txt.gz:
    The first column is the period to be collected, the second column is the source (Google or Wikipedia), the third column is the Wikipedia article id, the fourth column is the search keyword, the fifth column is the date, and the sixth column is the value of search trend or page view.

    Publication

    This data set was created for our study. If you make use of this data set, please cite:
    Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015.
    http://dx.doi.org/10.1145/2786451.2786495
    http://arxiv.org/abs/1509.02218 (author-created version)

    Note

    The raw data of Wikipedia page views is available in the following page.
    http://dumps.wikimedia.org/other/pagecounts-raw/

  5. Search Engineing Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Search Engineing Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/search-engine-marketing-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Search Engine Market Outlook



    The search engine market size was valued at approximately USD 124 billion in 2023 and is projected to reach USD 258 billion by 2032, witnessing a robust CAGR of 8.5% during the forecast period. This growth is largely attributed to the increasing reliance on digital platforms and the internet across various sectors, which has necessitated the use of search engines for data retrieval and information dissemination. With the proliferation of smartphones and the expansion of internet access globally, search engines have become indispensable tools for both businesses and consumers, driving the market's upward trajectory. The integration of artificial intelligence and machine learning technologies into search engines is transforming the way search engines operate, offering more personalized and efficient search results, thereby further propelling market growth.



    One of the primary growth factors in the search engine market is the ever-increasing digitalization across industries. As businesses continue to transition from traditional modes of operation to digital platforms, the need for search engines to navigate and manage data becomes paramount. This shift is particularly evident in industries such as retail, BFSI, and healthcare, where vast amounts of data are generated and require efficient management and retrieval systems. The integration of AI and machine learning into search engine algorithms has enhanced their ability to process and interpret large datasets, thereby improving the accuracy and relevance of search results. This technological advancement not only improves user experience but also enhances the competitive edge of businesses, further fueling market growth.



    Another significant growth factor is the expanding e-commerce sector, which relies heavily on search engines to connect consumers with products and services. With the rise of e-commerce giants and online marketplaces, consumers are increasingly using search engines to find the best prices, reviews, and availability of products, leading to a surge in search engine usage. Additionally, the implementation of voice search technology and the growing popularity of smart home devices have introduced new dynamics to search engine functionality. Consumers are now able to conduct searches verbally, which has necessitated the adaptation of search engines to incorporate natural language processing capabilities, further driving market growth.



    The advertising and marketing sectors are also contributing significantly to the growth of the search engine market. Businesses are leveraging search engines as a primary tool for online advertising, given their wide reach and ability to target specific audiences. Pay-per-click advertising and search engine optimization strategies have become integral components of digital marketing campaigns, enabling businesses to enhance their visibility and engagement with potential customers. The measurable nature of these advertising techniques allows businesses to assess the effectiveness of their campaigns and make data-driven decisions, thereby increasing their reliance on search engines and contributing to overall market growth.



    The evolution of search engines is closely tied to the development of Ai Enterprise Search, which is revolutionizing how businesses access and utilize information. Ai Enterprise Search leverages artificial intelligence to provide more accurate and contextually relevant search results, making it an invaluable tool for organizations that manage large volumes of data. By understanding user intent and learning from past interactions, Ai Enterprise Search systems can deliver personalized experiences that enhance productivity and decision-making. This capability is particularly beneficial in sectors such as finance and healthcare, where quick access to precise information is crucial. As businesses continue to digitize and data volumes grow, the demand for Ai Enterprise Search solutions is expected to increase, further driving the growth of the search engine market.



    Regionally, North America holds a significant share of the search engine market, driven by the presence of major technology companies and a well-established digital infrastructure. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. This growth can be attributed to the rapid digital transformation in emerging economies such as China and India, where increasing internet penetration and smartphone adoption are driving demand for search engines. Additionally, government initiatives to

  6. f

    Data from: S1 Dataset -

    • figshare.com
    • plos.figshare.com
    xlsx
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muath Saad Alassaf; Ayman Bakkari; Jehad Saleh; Abdulsamad Habeeb; Bashaer Fahad Aljuhani; Ahmad A. Qazali; Ahmed Yaseen Alqutaibi (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0312832.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Muath Saad Alassaf; Ayman Bakkari; Jehad Saleh; Abdulsamad Habeeb; Bashaer Fahad Aljuhani; Ahmad A. Qazali; Ahmed Yaseen Alqutaibi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThis study aimed to investigate the quality and readability of online English health information about dental sensitivity and how patients evaluate and utilize these web-based information.MethodsThe credibility and readability of health information was obtained from three search engines. We conducted searches in "incognito" mode to reduce the possibility of biases. Quality assessment utilized JAMA benchmarks, the DISCERN tool, and HONcode. Readability was analyzed using the SMOG, FRE, and FKGL indices.ResultsOut of 600 websites, 90 were included, with 62.2% affiliated with dental or medical centers, among these websites, 80% exclusively related to dental implant treatments. Regarding JAMA benchmarks, currency was the most commonly achieved and 87.8% of websites fell into the "moderate quality" category. Word and sentence counts ranged widely with a mean of 815.7 (±435.4) and 60.2 (±33.3), respectively. FKGL averaging 8.6 (±1.6), SMOG scores averaging 7.6 (±1.1), and FRE scale showed a mean of 58.28 (±9.1), with "fair difficult" being the most common category.ConclusionThe overall evaluation using DISCERN indicated a moderate quality level, with a notable absence of referencing. JAMA benchmarks revealed a general non-adherence among websites, as none of the websites met all of the four criteria. Only one website was HON code certified, suggesting a lack of reliable sources for web-based health information accuracy. Readability assessments showed varying results, with the majority being "fair difficult". Although readability did not significantly differ across affiliations, a wide range of the number of words and sentences count was observed between them.

  7. Data from: Inventory of online public databases and repositories holding...

    • catalog.data.gov
    • s.cnmilf.com
    • +4more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

  8. P

    How to Login DuckDuckGo Account? | A Step-By-Step Guide Dataset

    • paperswithcode.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How to Login DuckDuckGo Account? | A Step-By-Step Guide Dataset [Dataset]. https://paperswithcode.com/dataset/how-to-login-duckduckgo-account-a-step-by
    Explore at:
    Dataset updated
    Jun 17, 2025
    Description

    For Login DuckDuckGo Please Visit: 👉 DuckDuckGo Login Account

    In today’s digital age, privacy has become one of the most valued aspects of online activity. With increasing concerns over data tracking, surveillance, and targeted advertising, users are turning to privacy-first alternatives for everyday browsing. One of the most recognized names in private search is DuckDuckGo. Unlike mainstream search engines, DuckDuckGo emphasizes anonymity and transparency. However, many people wonder: Is there such a thing as a "https://duckduckgo-account.blogspot.com/ ">DuckDuckGo login account ?

    In this comprehensive guide, we’ll explore everything you need to know about the DuckDuckGo login account, what it offers (or doesn’t), and how to get the most out of DuckDuckGo’s privacy features.

    Does DuckDuckGo Offer a Login Account? To clarify up front: DuckDuckGo does not require or offer a traditional login account like Google or Yahoo. The concept of a DuckDuckGo login account is somewhat misleading if interpreted through the lens of typical internet services.

    DuckDuckGo's entire business model is built around privacy. The company does not track users, store personal information, or create user profiles. As a result, there’s no need—or intention—to implement a system that asks users to log in. This stands in stark contrast to other search engines that rely on login-based ecosystems to collect and use personal data for targeted ads.

    That said, some users still search for the term DuckDuckGo login account, usually because they’re trying to save settings, sync devices, or use features that may suggest a form of account system. Let’s break down what’s possible and what alternatives exist within DuckDuckGo’s platform.

    Saving Settings Without a DuckDuckGo Login Account Even without a traditional DuckDuckGo login account, users can still save their preferences. DuckDuckGo provides two primary ways to retain search settings:

    Local Storage (Cookies) When you customize your settings on the DuckDuckGo account homepage, such as theme, region, or safe search options, those preferences are stored in your browser’s local storage. As long as you don’t clear cookies or use incognito mode, these settings will persist.

    Cloud Save Feature To cater to users who want to retain settings across multiple devices without a DuckDuckGo login account, DuckDuckGo offers a feature called "Cloud Save." Instead of creating an account with a username or password, you generate a passphrase or unique key. This key can be used to retrieve your saved settings on another device or browser.

    While it’s not a conventional login system, it’s the closest DuckDuckGo comes to offering account-like functionality—without compromising privacy.

    Why DuckDuckGo Avoids Login Accounts Understanding why there is no DuckDuckGo login account comes down to the company’s core mission: to offer a private, non-tracking search experience. Introducing login accounts would:

    Require collecting some user data (e.g., email, password)

    Introduce potential tracking mechanisms

    Undermine their commitment to full anonymity

    By avoiding a login system, DuckDuckGo keeps user trust intact and continues to deliver on its promise of complete privacy. For users who value anonymity, the absence of a DuckDuckGo login account is actually a feature, not a flaw.

    DuckDuckGo and Device Syncing One of the most commonly searched reasons behind the term DuckDuckGo login account is the desire to sync settings or preferences across multiple devices. Although DuckDuckGo doesn’t use accounts, the Cloud Save feature mentioned earlier serves this purpose without compromising security or anonymity.

    You simply export your settings using a unique passphrase on one device, then import them using the same phrase on another. This offers similar benefits to a synced account—without the need for usernames, passwords, or emails.

    DuckDuckGo Privacy Tools Without a Login DuckDuckGo is more than just a search engine. It also offers a range of privacy tools—all without needing a DuckDuckGo login account:

    DuckDuckGo Privacy Browser (Mobile): Available for iOS and Android, this browser includes tracking protection, forced HTTPS, and built-in private search.

    DuckDuckGo Privacy Essentials (Desktop Extension): For Chrome, Firefox, and Edge, this extension blocks trackers, grades websites on privacy, and enhances encryption.

    Email Protection: DuckDuckGo recently launched a service that allows users to create "@duck.com" email addresses that forward to their real email—removing trackers in the process. Users sign up for this using a token or limited identifier, but it still doesn’t constitute a full DuckDuckGo login account.

    Is a DuckDuckGo Login Account Needed? For most users, the absence of a DuckDuckGo login account is not only acceptable—it’s ideal. You can:

    Use the search engine privately

    Customize and save settings

    Sync preferences across devices

    Block trackers and protect email

    —all without an account.

    While some people may find the lack of a traditional login unfamiliar at first, it quickly becomes a refreshing break from constant credential requests, data tracking, and login fatigue.

    The Future of DuckDuckGo Accounts As of now, DuckDuckGo maintains its position against traditional account systems. However, it’s clear the company is exploring privacy-preserving ways to offer more user features—like Email Protection and Cloud Save. These features may continue to evolve, but the core commitment remains: no tracking, no personal data storage, and no typical DuckDuckGo login account.

    Final Thoughts While the term DuckDuckGo login account is frequently searched, it represents a misunderstanding of how the platform operates . Unlike other tech companies that monetize personal data, DuckDuckGo has stayed true to its promise of privacy .

  9. Indian Pharmaceutical Products

    • kaggle.com
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishgeeky (2025). Indian Pharmaceutical Products [Dataset]. http://doi.org/10.34740/kaggle/ds/7699271
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    Kaggle
    Authors
    Rishgeeky
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains a comprehensive collection of over 250,000 pharmaceutical products available in India, including details like medicine name, price (INR), manufacturer, packaging, and active compositions.

    Each entry reflects structured real-world pharmaceutical product data, useful for analyzing trends in medicine pricing, formulations, discontinued products, and market competition. The dataset was cleaned to remove duplicates, extract quantities from packaging labels, and enrich fields like medicine form and composition structure.

    Columns Included:

    • id: Unique ID for each medicine
    • name: Brand name of the drug
    • price_inr: Retail price in Indian Rupees
    • is_discontinued: Whether the product is active or discontinued
    • manufacturer_name: Drug manufacturing company
    • packaging: Original packaging info (e.g., "strip of 10 tablets")
    • pack_quantity: Number or volume extracted from packaging
    • pack_unit: Unit of measurement (e.g., tablets, ml)
    • active_ingredient_1 & active_ingredient_2: Composition of the medicine
    • medicine_form: Extracted form such as Tablet, Syrup, Injection, etc.

    Possible Use Cases:

    • Analyzing drug price variations across manufacturers
    • Identifying top manufacturers or most common drug compositions
    • Drug recommendation or search engine (based on active ingredients)
    • Research in pharmacoeconomics, generic vs. branded pricing

    Disclaimer: This dataset is compiled for educational and analytical use only. It does not provide medical advice or endorsements.

  10. d

    China Consumer Interest from Baidu Search Index Analytics | Online Search...

    • datarade.ai
    .json, .csv
    Updated Apr 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datago Technology Limited (2024). China Consumer Interest from Baidu Search Index Analytics | Online Search Trends Data | 3000+ Global Consumer Bands | Daily Update [Dataset]. https://datarade.ai/data-products/china-consumer-interest-from-baidu-search-index-analytics-o-datago-technology-limited
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Apr 1, 2024
    Dataset authored and provided by
    Datago Technology Limited
    Area covered
    China
    Description

    Baidu Search Index is a big data analytics tool developed by Baidu to track changes in keyword search popularity within its search engine. By analyzing trends in the Baidu Search Index for specific keywords, users can effectively monitor public interest in topics, companies, or brands.

    As an ecosystem partner of Baidu Index, Datago has direct access to keyword search index data from Baidu's database, leveraging this information to build the BSIA-Consumer. This database encompasses popular brands that are actively searched by Chinese consumers, along with their commonly used names. By tracking Baidu Index search trends for these keywords, Datago precisely maps them to their corresponding publicly listed stocks.

    The database covers over 1,100 consumer stocks and 3,000+ brand keywords across China, the United States, Europe, and Japan, with a particular focus on popular sectors like luxury goods and vehicles. Through its analysis of Chinese consumer search interest, this database offers investors a unique perspective on market sentiment, consumer preferences, and brand influence, including:

    • Brand Influence Tracking – By leveraging Baidu Search Index data, investors can assess the level of consumer interest in various brands, helping to evaluate their influence and trends within the Chinese market.

    • Consumer Stock Mapping – BSIA-consumer provides an accurate linkage between brand keywords and their associated consumer stocks, enabling investor analysis driven by consumer interest.

    • Coverage of Popular Consumer Goods – BSIA-consumer focuses specifically on trending sectors like luxury goods and vehicles, offering valuable insights into these industries.

    • Coverage: 1000+ consumer stocks

    • History: 2016-01-01

    • Update Frequency: Daily

  11. Most popular database management systems worldwide 2024

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2024
    Area covered
    Worldwide
    Description

    As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

  12. Dataset used to guide the development of Scout and bechmark XL-MS search...

    • data.niaid.nih.gov
    xml
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Aug 3, 2024
    Dataset provided by
    Leibniz-Forschungsinstitut for Molecular Pharmacology
    Leibniz-Forschungsinstitut fuer Molekulare Pharmakologie
    Authors
    Max Ruwolt; Fan Liu
    Variables measured
    Proteomics
    Description

    This submission includes the raw data analyzed and search results described in our manuscript “Proteome-Scale Recombinant Standards And A Robust High-Speed Search Engine To Advance Cross-Linking MS-Based Interactomics”. In this study, we develop a strategy to generate a well-controlled XL-MS standard by systematically mixing and cross-linking recombinant proteins. The standard can be split into independent datasets, each of which has the MS2-level complexity of a typical proteome-wide XL-MS experiment. The raw datasets included in this submission were used to (1) guide the development of Scout, a machine learning-based search engine for XL-MS with MS-cleavable cross-linkers (batch 1), test different LC-MS acquisition methods (batch 2), and directly compare Scout to widely used XL-MS search engines (batches 3 and 4).

  13. h

    healthsearchqa

    • huggingface.co
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AISC Team D1 (2024). healthsearchqa [Dataset]. https://huggingface.co/datasets/aisc-team-d1/healthsearchqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2024
    Dataset authored and provided by
    AISC Team D1
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    HealthSearchQA

    Dataset of consumer health questions released by Google for the Med-PaLM paper (arXiv preprint). From the paper: We curated our own additional dataset consisting of 3,173 commonly searched consumer questions, referred to as HealthSearchQA. The dataset was curated using seed medical conditions and their associated symptoms. We used the seed data to retrieve publicly-available commonly searched questions generated by a search engine, which were displayed to all users… See the full description on the dataset page: https://huggingface.co/datasets/aisc-team-d1/healthsearchqa.

  14. P

    MSLR WEB30K Dataset

    • paperswithcode.com
    Updated Apr 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tao Qin; Tie-Yan Liu (2025). MSLR WEB30K Dataset [Dataset]. https://paperswithcode.com/dataset/mslr-web30k
    Explore at:
    Dataset updated
    Apr 14, 2025
    Authors
    Tao Qin; Tie-Yan Liu
    Description

    The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment labels:

    (1) The relevance judgments are obtained from a retired labeling set of a commercial web search engine (Microsoft Bing), which take 5 values from 0 (irrelevant) to 4 (perfectly relevant).

    (2) The features are basically extracted by us, and are those widely used in the research community.

    In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.

  15. h

    msmarco-msmarco-distilbert-base-v3

    • huggingface.co
    Updated Nov 19, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sentence Transformers (2014). msmarco-msmarco-distilbert-base-v3 [Dataset]. https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 19, 2014
    Dataset authored and provided by
    Sentence Transformers
    Description

    MS MARCO with hard negatives from msmarco-distilbert-base-v3

    MS MARCO is a large scale information retrieval corpus that was created based on real user search queries using the Bing search engine. For each query and gold positive passage, the 50 most similar paragraphs were mined using 13 different models. The resulting data can be used to train Sentence Transformer models.

      Related Datasets
    

    These are the datasets generated using the 13 different models:

    msmarco-bm25… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3.

  16. The State of Serverless Applications: Collection,Characterization, and...

    • zenodo.org
    zip
    Updated Aug 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Eismann; Joel Scheuner; Erwin van Eyk; Maximilian Schwinger; Johannes Grohmann; NIkolas Herbst; Cristina Abad; Simon Eismann; Joel Scheuner; Erwin van Eyk; Maximilian Schwinger; Johannes Grohmann; NIkolas Herbst; Cristina Abad (2021). The State of Serverless Applications: Collection,Characterization, and Community Consensus - Replication Package [Dataset]. http://doi.org/10.5281/zenodo.5185055
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 12, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Simon Eismann; Joel Scheuner; Erwin van Eyk; Maximilian Schwinger; Johannes Grohmann; NIkolas Herbst; Cristina Abad; Simon Eismann; Joel Scheuner; Erwin van Eyk; Maximilian Schwinger; Johannes Grohmann; NIkolas Herbst; Cristina Abad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The replication package for our article The State of Serverless Applications: Collection,Characterization, and Community Consensus provides everything required to reproduce all results for the following three studies:

    • Serverless Application Collection
    • Serverless Application Characterization
    • Comparison Study

    Serverless Application Collection

    We collect descriptions of serverless applications from open-source projects, academic literature, industrial literature, and scientific computing.

    Open-source Applications

    As a starting point, we used an existing data set on open-source serverless projects from this study. We removed small and inactive projects based on the number of files, commits, contributors, and watchers. Next, we manually filtered the resulting data set to include only projects that implement serverless applications. We provide a table containing all projects that remained after the filtering alongside the notes from the manual filtering.

    Academic Literature Applications

    We based our search on an existing community-curated dataset on literature for serverless computing consisting of over 180 peer-reviewed articles. First, we filtered the articles based on title and abstract. In a second iteration, we filtered out any articles that implement only a single function for evaluation purposes or do not include sufficient detail to enable a review. As the authors were familiar with some additional publications describing serverless applications, we contributed them to the community-curated dataset and included them in this study. We provide a table with our notes from the manual filtering.

    Scientific Computing Applications

    Most of these scientific computing serverless applications are still at an early stage and therefore there is little public data available. One of the authors is employed at the German Aerospace Center (DLR) at the time of writing, which allowed us to collect information about several projects at DLR that are either currently moving to serverless solutions or are planning to do so. Additionally, an application from the German Electron Synchrotron (DESY) could be included. For each of these scientific computing applications, we provide a document containing a description of the project and the names of our contacts that provided information for the characterization of these applications.

    • SC1 Copernicus Sentinel-1 for near-real-time water monitoring
    • SC2 Reprocessing Sentinel 5 Precursor data with ProEO
    • SC3 High-Performance Data Analytics for Earth Observation
    • SC4 Tandem-L exploitation platform
    • SC5 Global Urban Footprint
    • SC6 DESY - High Throughput Data Taking

    Collection of serverless applications

    Based on the previously described methodology, we collected a diverse dataset of 89 serverless applications from open-source projects, academic literature, industrial literature, and scientific computing. This dataset is can be found in Dataset.xlsx.

    Serverless Application Characterization

    As previously described, we collected 89 serverless applications from four different sources. Subsequently, two randomly assigned reviewers out of seven available reviewers characterized each application along 22 characteristics in a structured collaborative review sheet. The characteristics and potential values were defined a priori by the authors and iteratively refined, extended, and generalized during the review process. The initial moderate inter-rater agreement was followed by a discussion and consolidation phase, where all differences between the two reviewers were discussed and resolved. The six scientific applications were not publicly available and therefore characterized by a single domain expert, who is either involved in the development of the applications or in direct contact with the development team.

    Initial Ratings & Interrater Agreement Calculation

    The initial reviews are available as a table, where every application is characterized along with the 22 characteristics. A single value indicates that both reviewers assigned the same value, whereas a value of the form [Reviewer 2] A | [Reviewer 4] B indicates that for this characteristic, reviewer two assigned the value A, whereas reviewer assigned the value B.

    Our script for the calculation of the FleiĂź-Kappa score based on this data is also publically available. It requires the python package pandas and statsmodels. It does not require any input and assumes that the file Initial Characterizations.csv is located in the same folder. It can be executed as follows:

    python3 CalculateKappa.py
    

    Results Including Unknown Data

    In the following discussion and consolidation phase, the reviewers compared their notes and tried to reach a consensus for the characteristics with conflicting assignments. In a few cases, the two reviewers had different interpretations of a characteristic. These conflicts were discussed among all authors to ensure that characteristic interpretations were consistent. However, for most conflicts, the consolidation was a quick process as the most frequent type of conflict was that one reviewer found additional documentation that the other reviewer did not find.

    For six characteristics, many applications were assigned the ''Unknown'' value, i.e., the reviewers were not able to determine the value of this characteristic. Therefore, we excluded these characteristics from this study. For the remaining characteristics, the percentage of ''Unknowns'' ranges from 0–19% with two outliers at 25% and 30%. These ''Unknowns'' were excluded from the percentage values presented in the article. As part of our replication package, we provide the raw results for each characteristic including the ''Unknown'' percentages in the form of bar charts.

    The script for the generation of these bar charts is also part of this replication package). It uses the python packages pandas, numpy, and matplotlib. It does not require any input and assumes that the file Dataset.csv is located in the same folder. It can be executed as follows:

    python3 GenerateResultsIncludingUnknown.py
    

    Final Dataset & Figure Generation

    In the following discussion and consolidation phase, the reviewers compared their notes and tried to reach a consensus for the characteristics with conflicting assignments. In a few cases, the two reviewers had different interpretations of a characteristic. These conflicts were discussed among all authors to ensure that characteristic interpretations were consistent. However, for most conflicts, the consolidation was a quick process as the most frequent type of conflict was that one reviewer found additional documentation that the other reviewer did not find. Following this process, we were able to resolve all conflicts, resulting in a collection of 89 applications described by 18 characteristics. This dataset is available here: link

    The script to generate all figures shown in the chapter "Serverless Application Characterization can be found here. It does not require any input but assumes that the file Dataset.csv is located in the same folder. It uses the python packages pandas, numpy, and matplotlib. It can be executed as follows:

    python3 GenerateFigures.py
    

    Comparison Study

    To identify existing surveys and datasets that also investigate one of our characteristics, we conducted a literature search using Google as our search engine, as we were mostly looking for grey literature. We used the following search term:

    ("serverless" OR "faas") AND ("dataset" OR "survey" OR "report") after: 2018-01-01
    

    This search term looks for any combination of either serverless or faas alongside any of the terms dataset, survey, or report. We further limited the search to any articles after 2017, as serverless is a fast-moving field and therefore any older studies are likely outdated already. This search term resulted in a total of 173 search results. In order to validate if using only a single search engine is sufficient, and if the search term is broad enough, we

  17. Data from: Analysis of the Quantitative Impact of Social Networks General...

    • figshare.com
    • produccioncientifica.ucm.es
    doc
    Updated Oct 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Parra; Santiago Martínez Arias; Sergio Mena Muñoz (2022). Analysis of the Quantitative Impact of Social Networks General Data.doc [Dataset]. http://doi.org/10.6084/m9.figshare.21329421.v1
    Explore at:
    docAvailable download formats
    Dataset updated
    Oct 14, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    David Parra; Santiago Martínez Arias; Sergio Mena Muñoz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union". Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content? To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic. In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
    Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained. To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market. It includes:

    Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures

  18. h

    msmarco-distilbert-margin-mse-cls-dot-v1

    • huggingface.co
    Updated Jun 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sentence Transformers (2025). msmarco-distilbert-margin-mse-cls-dot-v1 [Dataset]. https://huggingface.co/datasets/sentence-transformers/msmarco-distilbert-margin-mse-cls-dot-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Sentence Transformers
    Description

    MS MARCO with hard negatives from distilbert-margin-mse-cls-dot-v1

    MS MARCO is a large scale information retrieval corpus that was created based on real user search queries using the Bing search engine. For each query and gold positive passage, the 50 most similar paragraphs were mined using 13 different models. The resulting data can be used to train Sentence Transformer models.

      Related Datasets
    

    These are the datasets generated using the 13 different models:… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/msmarco-distilbert-margin-mse-cls-dot-v1.

  19. f

    Data from: Comparative Evaluation of Proteome Discoverer and FragPipe for...

    • acs.figshare.com
    zip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tianen He; Youqi Liu; Yan Zhou; Lu Li; He Wang; Shanjun Chen; Jinlong Gao; Wenhao Jiang; Yi Yu; Weigang Ge; Hui-Yin Chang; Ziquan Fan; Alexey I. Nesvizhskii; Tiannan Guo; Yaoting Sun (2023). Comparative Evaluation of Proteome Discoverer and FragPipe for the TMT-Based Proteome Quantification [Dataset]. http://doi.org/10.1021/acs.jproteome.2c00390.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    ACS Publications
    Authors
    Tianen He; Youqi Liu; Yan Zhou; Lu Li; He Wang; Shanjun Chen; Jinlong Gao; Wenhao Jiang; Yi Yu; Weigang Ge; Hui-Yin Chang; Ziquan Fan; Alexey I. Nesvizhskii; Tiannan Guo; Yaoting Sun
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Isobaric labeling-based proteomics is widely applied in deep proteome quantification. Among the platforms for isobaric labeled proteomic data analysis, the commercial software Proteome Discoverer (PD) is widely used, incorporating the search engine CHIMERYS, while FragPipe (FP) is relatively new, free for noncommercial purposes, and integrates the engine MSFragger. Here, we compared PD and FP over three public proteomic data sets labeled using 6plex, 10plex, and 16plex tandem mass tags. Our results showed the protein abundances generated by the two software are highly correlated. PD quantified more proteins (10.02%, 15.44%, 8.19%) than FP with comparable NA ratios (0.00% vs. 0.00%, 0.85% vs. 0.38%, and 11.74% vs. 10.52%) in the three data sets. Using the 16plex data set, PD and FP outputs showed high consistency in quantifying technical replicates, batch effects, and functional enrichment in differentially expressed proteins. However, FP saved 93.93%, 96.65%, and 96.41% of processing time compared to PD for analyzing the three data sets, respectively. In conclusion, while PD is a well-maintained commercial software integrating various additional functions and can quantify more proteins, FP is freely available and achieves similar output with a shorter computational time. Our results will guide users in choosing the most suitable quantification software for their needs.

  20. h

    msmarco-co-condenser-margin-mse-cls-v1

    • huggingface.co
    Updated Jun 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sentence Transformers (2025). msmarco-co-condenser-margin-mse-cls-v1 [Dataset]. https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-cls-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Sentence Transformers
    Description

    MS MARCO with hard negatives from co-condenser-margin-mse-cls-v1

    MS MARCO is a large scale information retrieval corpus that was created based on real user search queries using the Bing search engine. For each query and gold positive passage, the 50 most similar paragraphs were mined using 13 different models. The resulting data can be used to train Sentence Transformer models.

      Related Datasets
    

    These are the datasets generated using the 13 different models:… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-cls-v1.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
DataForSEO (2023). DataForSEO Google Full (Keywords+SERP) database, historical data available [Dataset]. https://datarade.ai/data-products/dataforseo-google-full-keywords-serp-database-historical-d-dataforseo
Organization logo

DataForSEO Google Full (Keywords+SERP) database, historical data available

Explore at:
.json, .csvAvailable download formats
Dataset updated
Aug 17, 2023
Dataset provided by
Authors
DataForSEO
Area covered
Burkina Faso, Portugal, United Kingdom, CĂ´te d'Ivoire, Cyprus, Paraguay, Sweden, Costa Rica, South Africa, Bolivia (Plurinational State of)
Description

You can check the fields description in the documentation: current Full database: https://docs.dataforseo.com/v3/databases/google/full/?bash; Historical Full database: https://docs.dataforseo.com/v3/databases/google/history/full/?bash.

Full Google Database is a combination of the Advanced Google SERP Database and Google Keyword Database.

Google SERP Database offers millions of SERPs collected in 67 regions with most of Google’s advanced SERP features, including featured snippets, knowledge graphs, people also ask sections, top stories, and more.

Google Keyword Database encompasses billions of search terms enriched with related Google Ads data: search volume trends, CPC, competition, and more.

This database is available in JSON format only.

You don’t have to download fresh data dumps in JSON – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.

Search
Clear search
Close search
Google apps
Main menu