57 datasets found
  1. i

    Evolution of Web search engine interfaces through SERP screenshots and HTML...

    • rdm.inesctec.pt
    Updated Jul 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Evolution of Web search engine interfaces through SERP screenshots and HTML complete pages for 20 years - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2021-003
    Explore at:
    Dataset updated
    Jul 26, 2021
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset was extracted for a study on the evolution of Web search engine interfaces since their appearance. The well-known list of “10 blue links” has evolved into richer interfaces, often personalized to the search query, the user, and other aspects. We used the most searched queries by year to extract a representative sample of SERP from the Internet Archive. The Internet Archive has been keeping snapshots and the respective HTML version of webpages over time and tts collection contains more than 50 billion webpages. We used Python and Selenium Webdriver, for browser automation, to visit each capture online, check if the capture is valid, save the HTML version, and generate a full screenshot. The dataset contains all the extracted captures. Each capture is represented by a screenshot, an HTML file, and a files' folder. We concatenate the initial of the search engine (G) with the capture's timestamp for file naming. The filename ends with a sequential integer "-N" if the timestamp is repeated. For example, "G20070330145203-1" identifies a second capture from Google by March 30, 2007. The first is identified by "G20070330145203". Using this dataset, we analyzed how SERP evolved in terms of content, layout, design (e.g., color scheme, text styling, graphics), navigation, and file size. We have registered the appearance of SERP features and analyzed the design patterns involved in each SERP component. We found that the number of elements in SERP has been rising over the years, demanding a more extensive interface area and larger files. This systematic analysis portrays evolution trends in search engine user interfaces and, more generally, web design. We expect this work will trigger other, more specific studies that can take advantage of the dataset we provide here. This graphic represents the diversity of captures by year and search engine (Google and Bing).

  2. Google Trends and Wikipedia Page Views

    • zenodo.org
    • explore.openaire.eu
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitsuo Yoshida; Mitsuo Yoshida (2020). Google Trends and Wikipedia Page Views [Dataset]. http://doi.org/10.5281/zenodo.14539
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mitsuo Yoshida; Mitsuo Yoshida
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Abstract (our paper)

    The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends.

    Data

    personal-name.txt.gz:
    The first column is the Wikipedia article id, the second column is the search keyword, the third column is the Wikipedia article title, and the fourth column is the total of page views from 2008 to 2014.

    personal-name_data_google-trends.txt.gz, personal-name_data_wikipedia.txt.gz:
    The first column is the period to be collected, the second column is the source (Google or Wikipedia), the third column is the Wikipedia article id, the fourth column is the search keyword, the fifth column is the date, and the sixth column is the value of search trend or page view.

    Publication

    This data set was created for our study. If you make use of this data set, please cite:
    Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015.
    http://dx.doi.org/10.1145/2786451.2786495
    http://arxiv.org/abs/1509.02218 (author-created version)

    Note

    The raw data of Wikipedia page views is available in the following page.
    http://dumps.wikimedia.org/other/pagecounts-raw/

  3. Search Engines in Germany - Market Research Report (2015-2030)

    • ibisworld.com
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IBISWorld (2024). Search Engines in Germany - Market Research Report (2015-2030) [Dataset]. https://www.ibisworld.com/germany/industry/search-engines/935/
    Explore at:
    Dataset updated
    Jun 19, 2024
    Dataset authored and provided by
    IBISWorld
    License

    https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/

    Time period covered
    2014 - 2029
    Area covered
    Germany
    Description

    In the last five years, the web portal industry has recorded significant revenue growth. Industry revenue increased by an average of 3.8% per year between 2019 and 2024 and is expected to reach 12.6 billion euros in the current year. The web portal industry comprises a variety of platforms such as social networks, search engines, video platforms and email services that are used by millions of users every day. These portals enable the exchange of information and communication as well as entertainment. Web portals generate their revenue mainly through advertising, premium services and commission payments. User numbers are rising steadily as more and more people go online and everyday processes are increasingly digitalised.In 2024, industry revenue is expected to increase by 3.2 %. Although the industry is growing, it is also facing challenges, particularly in terms of data protection. Web portals are constantly collecting user data, which can lead to misuse of the collected data. The General Data Protection Regulation (GDPR) introduced in the European Union in 2018 has prompted web portal operators to review their data protection practices and amend their terms and conditions in order to avoid fines. The aim of this regulation is to improve the protection of personal data and prevent data misuse.The industry's turnover is expected to increase by an average of 3.6% per year to 15 billion euros over the next five years. Video platforms such as YouTube often generate losses despite high user numbers. The reasons for this are the high costs of operation and infrastructure as well as expenses for copyright issues and compliance. Advertising on video platforms is perceived negatively by users, but is successful when it comes to attracting attention. Politicians are debating the taxation of revenues generated by internationally operating web portals based in tax havens. Another challenge is the copying of concepts, which inhibits innovation in the industry and can lead to legal problems.

  4. Google SERP Data, Web Search Data, Google Images Data | Real-Time API

    • datarade.ai
    .json, .csv
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenWeb Ninja (2024). Google SERP Data, Web Search Data, Google Images Data | Real-Time API [Dataset]. https://datarade.ai/data-products/openweb-ninja-google-data-google-image-data-google-serp-d-openweb-ninja
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    May 17, 2024
    Dataset authored and provided by
    OpenWeb Ninja
    Area covered
    Panama, Burundi, Uganda, Barbados, South Georgia and the South Sandwich Islands, Ireland, Tokelau, Virgin Islands (U.S.), Uruguay, Grenada
    Description

    OpenWeb Ninja's Google Images Data (Google SERP Data) API provides real-time image search capabilities for images sourced from all public sources on the web.

    The API enables you to search and access more than 100 billion images from across the web including advanced filtering capabilities as supported by Google Advanced Image Search. The API provides Google Images Data (Google SERP Data) including details such as image URL, title, size information, thumbnail, source information, and more data points. The API supports advanced filtering and options such as file type, image color, usage rights, creation time, and more. In addition, any Advanced Google Search operators can be used with the API.

    OpenWeb Ninja's Google Images Data & Google SERP Data API common use cases:

    • Creative Media Production: Enhance digital content with a vast array of real-time images, ensuring engaging and brand-aligned visuals for blogs, social media, and advertising.

    • AI Model Enhancement: Train and refine AI models with diverse, annotated images, improving object recognition and image classification accuracy.

    • Trend Analysis: Identify emerging market trends and consumer preferences through real-time visual data, enabling proactive business decisions.

    • Innovative Product Design: Inspire product innovation by exploring current design trends and competitor products, ensuring market-relevant offerings.

    • Advanced Search Optimization: Improve search engines and applications with enriched image datasets, providing users with accurate, relevant, and visually appealing search results.

    OpenWeb Ninja's Annotated Imagery Data & Google SERP Data Stats & Capabilities:

    • 100B+ Images: Access an extensive database of over 100 billion images.

    • Images Data from all Public Sources (Google SERP Data): Benefit from a comprehensive aggregation of image data from various public websites, ensuring a wide range of sources and perspectives.

    • Extensive Search and Filtering Capabilities: Utilize advanced search operators and filters to refine image searches by file type, color, usage rights, creation time, and more, making it easy to find exactly what you need.

    • Rich Data Points: Each image comes with more than 10 data points, including URL, title (annotation), size information, thumbnail, and source information, providing a detailed context for each image.

  5. Search Engines Comparison and Websites Performance

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgios Ntimo; Vasilios Ntararas; Georgios Ntimo; Vasilios Ntararas (2023). Search Engines Comparison and Websites Performance [Dataset]. http://doi.org/10.5281/zenodo.8102700
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Georgios Ntimo; Vasilios Ntararas; Georgios Ntimo; Vasilios Ntararas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The current dataset is consisted of 200 search results extracted from Google and Bing engines (100 of Google and 100 of Bing). The search terms are selected from the 10 most search keywords of 2021 based on the provided data of Google Trends. The rest of the sheets include the performance of the websites according to three technical evaluation aspects. That is, SEO, Speed and Security. The performance dataset has been developed through the utilization of CheckBot crawling tool. The whole dataset can help information retrieval scientists to compare the two engines in terms of their position/ranking and their performance related to these factors.

    For more information about the thinking of the of the structure of the dataset please contact the Information Management Lab of University of West Attica.

    Contact Persons: Vasilis Ntararas (lb17032@uniwa.gr) , Georgios Ntimo (lb17100@uniwa.gr) and Ioannis C. Drivas (idrivas@uniwa.gr)

  6. Next Generation Search Engines Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Next Generation Search Engines Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/next-generation-search-engines-market-global-industry-analysis
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Next Generation Search Engines Market Outlook




    According to our latest research, the global Next Generation Search Engines market size reached USD 16.2 billion in 2024, with a robust year-on-year growth driven by rapid technological advancements and escalating demand for intelligent search solutions across industries. The market is expected to witness a CAGR of 18.7% during the forecast period from 2025 to 2033, propelling the market to a projected value of USD 82.3 billion by 2033. The accelerating adoption of artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) within search technologies is a key growth factor, as organizations seek more accurate, context-aware, and personalized information retrieval solutions.




    One of the most significant growth drivers for the Next Generation Search Engines market is the exponential increase in digital content and data generation worldwide. Enterprises and consumers alike are producing vast amounts of unstructured data daily, from documents and emails to social media posts and multimedia files. Traditional search engines often struggle to deliver relevant results from such complex datasets. Next generation search engines, powered by AI and ML algorithms, are uniquely positioned to address this challenge by providing semantic understanding, contextual relevance, and intent-driven results. This capability is especially critical for industries like healthcare, BFSI, and e-commerce, where timely and precise information retrieval can directly impact decision-making, operational efficiency, and customer satisfaction.




    Another major factor fueling the growth of the Next Generation Search Engines market is the proliferation of mobile devices and the evolution of user interaction paradigms. As consumers increasingly rely on smartphones, tablets, and voice assistants, there is a growing demand for search solutions that support voice and visual queries, in addition to traditional text-based searches. Technologies such as voice search and visual search are gaining traction, enabling users to interact with search engines more naturally and intuitively. This shift is prompting enterprises to invest in advanced search platforms that can seamlessly integrate with diverse devices and channels, enhancing user engagement and accessibility. The integration of NLP further empowers these platforms to understand complex queries, colloquial language, and regional dialects, making search experiences more inclusive and effective.




    Furthermore, the rise of enterprise digital transformation initiatives is accelerating the adoption of next generation search technologies across various sectors. Organizations are increasingly seeking to unlock the value of their internal data assets by deploying enterprise search solutions that can index, analyze, and retrieve information from multiple sources, including databases, intranets, cloud storage, and third-party applications. These advanced search engines not only improve knowledge management and collaboration but also support compliance, security, and data governance requirements. As businesses continue to embrace hybrid and remote work models, the need for efficient, secure, and scalable search capabilities becomes even more pronounced, driving sustained investment in this market.




    Regionally, North America currently dominates the Next Generation Search Engines market, owing to the early adoption of AI-driven technologies, strong presence of leading technology vendors, and high digital literacy rates. However, Asia Pacific is emerging as the fastest-growing region, fueled by rapid digitalization, expanding internet penetration, and increasing investments in AI research and development. Europe is also witnessing steady growth, supported by robust regulatory frameworks and growing demand for advanced search solutions in sectors such as BFSI, healthcare, and education. Latin America and the Middle East & Africa are gradually catching up, as enterprises in these regions recognize the value of next generation search engines in enhancing operational efficiency and customer experience.




  7. f

    Data from: S1 Dataset -

    • figshare.com
    • plos.figshare.com
    xlsx
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muath Saad Alassaf; Ayman Bakkari; Jehad Saleh; Abdulsamad Habeeb; Bashaer Fahad Aljuhani; Ahmad A. Qazali; Ahmed Yaseen Alqutaibi (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0312832.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Muath Saad Alassaf; Ayman Bakkari; Jehad Saleh; Abdulsamad Habeeb; Bashaer Fahad Aljuhani; Ahmad A. Qazali; Ahmed Yaseen Alqutaibi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThis study aimed to investigate the quality and readability of online English health information about dental sensitivity and how patients evaluate and utilize these web-based information.MethodsThe credibility and readability of health information was obtained from three search engines. We conducted searches in "incognito" mode to reduce the possibility of biases. Quality assessment utilized JAMA benchmarks, the DISCERN tool, and HONcode. Readability was analyzed using the SMOG, FRE, and FKGL indices.ResultsOut of 600 websites, 90 were included, with 62.2% affiliated with dental or medical centers, among these websites, 80% exclusively related to dental implant treatments. Regarding JAMA benchmarks, currency was the most commonly achieved and 87.8% of websites fell into the "moderate quality" category. Word and sentence counts ranged widely with a mean of 815.7 (±435.4) and 60.2 (±33.3), respectively. FKGL averaging 8.6 (±1.6), SMOG scores averaging 7.6 (±1.1), and FRE scale showed a mean of 58.28 (±9.1), with "fair difficult" being the most common category.ConclusionThe overall evaluation using DISCERN indicated a moderate quality level, with a notable absence of referencing. JAMA benchmarks revealed a general non-adherence among websites, as none of the websites met all of the four criteria. Only one website was HON code certified, suggesting a lack of reliable sources for web-based health information accuracy. Readability assessments showed varying results, with the majority being "fair difficult". Although readability did not significantly differ across affiliations, a wide range of the number of words and sentences count was observed between them.

  8. Indian Pharmaceutical Products

    • kaggle.com
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishgeeky (2025). Indian Pharmaceutical Products [Dataset]. http://doi.org/10.34740/kaggle/ds/7699271
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    Kaggle
    Authors
    Rishgeeky
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains a comprehensive collection of over 250,000 pharmaceutical products available in India, including details like medicine name, price (INR), manufacturer, packaging, and active compositions.

    Each entry reflects structured real-world pharmaceutical product data, useful for analyzing trends in medicine pricing, formulations, discontinued products, and market competition. The dataset was cleaned to remove duplicates, extract quantities from packaging labels, and enrich fields like medicine form and composition structure.

    Columns Included:

    • id: Unique ID for each medicine
    • name: Brand name of the drug
    • price_inr: Retail price in Indian Rupees
    • is_discontinued: Whether the product is active or discontinued
    • manufacturer_name: Drug manufacturing company
    • packaging: Original packaging info (e.g., "strip of 10 tablets")
    • pack_quantity: Number or volume extracted from packaging
    • pack_unit: Unit of measurement (e.g., tablets, ml)
    • active_ingredient_1 & active_ingredient_2: Composition of the medicine
    • medicine_form: Extracted form such as Tablet, Syrup, Injection, etc.

    Possible Use Cases:

    • Analyzing drug price variations across manufacturers
    • Identifying top manufacturers or most common drug compositions
    • Drug recommendation or search engine (based on active ingredients)
    • Research in pharmacoeconomics, generic vs. branded pricing

    Disclaimer: This dataset is compiled for educational and analytical use only. It does not provide medical advice or endorsements.

  9. h

    google_search_terms_training_data

    • huggingface.co
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hoshang Chenoy (2024). google_search_terms_training_data [Dataset]. https://huggingface.co/datasets/hoshangc/google_search_terms_training_data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2024
    Authors
    Hoshang Chenoy
    Description

    Dataset Card for Dataset Name

    This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Dataset Name: Google Search Trends Top Rising Search Terms Description: The Google Search Trends Top Rising Search Terms dataset provides valuable insights into the most rapidly growing search queries on the Google search engine. It offers a comprehensive collection of trending search… See the full description on the dataset page: https://huggingface.co/datasets/hoshangc/google_search_terms_training_data.

  10. P

    How to Login DuckDuckGo Account? | A Step-By-Step Guide Dataset

    • paperswithcode.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How to Login DuckDuckGo Account? | A Step-By-Step Guide Dataset [Dataset]. https://paperswithcode.com/dataset/how-to-login-duckduckgo-account-a-step-by
    Explore at:
    Dataset updated
    Jun 17, 2025
    Description

    For Login DuckDuckGo Please Visit: 👉 DuckDuckGo Login Account

    In today’s digital age, privacy has become one of the most valued aspects of online activity. With increasing concerns over data tracking, surveillance, and targeted advertising, users are turning to privacy-first alternatives for everyday browsing. One of the most recognized names in private search is DuckDuckGo. Unlike mainstream search engines, DuckDuckGo emphasizes anonymity and transparency. However, many people wonder: Is there such a thing as a "https://duckduckgo-account.blogspot.com/ ">DuckDuckGo login account ?

    In this comprehensive guide, we’ll explore everything you need to know about the DuckDuckGo login account, what it offers (or doesn’t), and how to get the most out of DuckDuckGo’s privacy features.

    Does DuckDuckGo Offer a Login Account? To clarify up front: DuckDuckGo does not require or offer a traditional login account like Google or Yahoo. The concept of a DuckDuckGo login account is somewhat misleading if interpreted through the lens of typical internet services.

    DuckDuckGo's entire business model is built around privacy. The company does not track users, store personal information, or create user profiles. As a result, there’s no need—or intention—to implement a system that asks users to log in. This stands in stark contrast to other search engines that rely on login-based ecosystems to collect and use personal data for targeted ads.

    That said, some users still search for the term DuckDuckGo login account, usually because they’re trying to save settings, sync devices, or use features that may suggest a form of account system. Let’s break down what’s possible and what alternatives exist within DuckDuckGo’s platform.

    Saving Settings Without a DuckDuckGo Login Account Even without a traditional DuckDuckGo login account, users can still save their preferences. DuckDuckGo provides two primary ways to retain search settings:

    Local Storage (Cookies) When you customize your settings on the DuckDuckGo account homepage, such as theme, region, or safe search options, those preferences are stored in your browser’s local storage. As long as you don’t clear cookies or use incognito mode, these settings will persist.

    Cloud Save Feature To cater to users who want to retain settings across multiple devices without a DuckDuckGo login account, DuckDuckGo offers a feature called "Cloud Save." Instead of creating an account with a username or password, you generate a passphrase or unique key. This key can be used to retrieve your saved settings on another device or browser.

    While it’s not a conventional login system, it’s the closest DuckDuckGo comes to offering account-like functionality—without compromising privacy.

    Why DuckDuckGo Avoids Login Accounts Understanding why there is no DuckDuckGo login account comes down to the company’s core mission: to offer a private, non-tracking search experience. Introducing login accounts would:

    Require collecting some user data (e.g., email, password)

    Introduce potential tracking mechanisms

    Undermine their commitment to full anonymity

    By avoiding a login system, DuckDuckGo keeps user trust intact and continues to deliver on its promise of complete privacy. For users who value anonymity, the absence of a DuckDuckGo login account is actually a feature, not a flaw.

    DuckDuckGo and Device Syncing One of the most commonly searched reasons behind the term DuckDuckGo login account is the desire to sync settings or preferences across multiple devices. Although DuckDuckGo doesn’t use accounts, the Cloud Save feature mentioned earlier serves this purpose without compromising security or anonymity.

    You simply export your settings using a unique passphrase on one device, then import them using the same phrase on another. This offers similar benefits to a synced account—without the need for usernames, passwords, or emails.

    DuckDuckGo Privacy Tools Without a Login DuckDuckGo is more than just a search engine. It also offers a range of privacy tools—all without needing a DuckDuckGo login account:

    DuckDuckGo Privacy Browser (Mobile): Available for iOS and Android, this browser includes tracking protection, forced HTTPS, and built-in private search.

    DuckDuckGo Privacy Essentials (Desktop Extension): For Chrome, Firefox, and Edge, this extension blocks trackers, grades websites on privacy, and enhances encryption.

    Email Protection: DuckDuckGo recently launched a service that allows users to create "@duck.com" email addresses that forward to their real email—removing trackers in the process. Users sign up for this using a token or limited identifier, but it still doesn’t constitute a full DuckDuckGo login account.

    Is a DuckDuckGo Login Account Needed? For most users, the absence of a DuckDuckGo login account is not only acceptable—it’s ideal. You can:

    Use the search engine privately

    Customize and save settings

    Sync preferences across devices

    Block trackers and protect email

    —all without an account.

    While some people may find the lack of a traditional login unfamiliar at first, it quickly becomes a refreshing break from constant credential requests, data tracking, and login fatigue.

    The Future of DuckDuckGo Accounts As of now, DuckDuckGo maintains its position against traditional account systems. However, it’s clear the company is exploring privacy-preserving ways to offer more user features—like Email Protection and Cloud Save. These features may continue to evolve, but the core commitment remains: no tracking, no personal data storage, and no typical DuckDuckGo login account.

    Final Thoughts While the term DuckDuckGo login account is frequently searched, it represents a misunderstanding of how the platform operates . Unlike other tech companies that monetize personal data, DuckDuckGo has stayed true to its promise of privacy .

  11. h

    agnewsadapted

    • huggingface.co
    Updated Apr 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo Ferreira Brigham (2023). agnewsadapted [Dataset]. https://huggingface.co/datasets/ebrigham/agnewsadapted
    Explore at:
    Dataset updated
    Apr 13, 2023
    Authors
    Eduardo Ferreira Brigham
    Description

    AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html . The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

  12. Data from: Inventory of online public databases and repositories holding...

    • s.cnmilf.com
    • datadiscoverystudio.org
    • +4more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, _domain-specific databases, and the top journals compare how much data is in institutional vs. _domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find _domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known _domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were _domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of _domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared _domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the _domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

  13. Quantum-Enhanced Neural Search Engine Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Quantum-Enhanced Neural Search Engine Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/quantum-enhanced-neural-search-engine-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Quantum-Enhanced Neural Search Engine Market Outlook



    According to our latest research, the Quantum-Enhanced Neural Search Engine market size reached USD 1.82 billion globally in 2024, reflecting the rapid adoption of quantum computing and advanced neural network architectures in enterprise search solutions. The market is projected to grow at a robust CAGR of 28.7% from 2025 to 2033, culminating in a forecasted market size of USD 15.46 billion by the end of 2033. This remarkable trajectory is primarily driven by the demand for highly efficient, accurate, and context-aware search engines capable of processing vast and complex datasets across industries.



    Several key growth factors are propelling the quantum-enhanced neural search engine market forward. The exponential increase in unstructured data, combined with the limitations of classical search algorithms, has created a significant need for more sophisticated search technologies. Quantum computing, when integrated with neural search algorithms, delivers unparalleled computational power and speed, enabling real-time semantic understanding and contextual relevance in search results. Organizations across sectors such as healthcare, finance, and e-commerce are investing heavily in these technologies to improve data-driven decision-making, enhance user experiences, and maintain a competitive edge in the digital era. The synergy between quantum computing and neural networks is unlocking new possibilities for natural language processing, image recognition, and predictive analytics, further fueling market growth.



    Another significant driver is the growing adoption of artificial intelligence and machine learning across enterprise operations. As businesses transition towards digital transformation, the need for intelligent search capabilities that can extract actionable insights from massive datasets becomes increasingly critical. Quantum-enhanced neural search engines offer a transformative leap in search efficiency, delivering faster and more accurate results than traditional systems. This is particularly valuable for industries dealing with sensitive or time-critical information, such as BFSI and healthcare, where the ability to retrieve relevant data instantaneously can have a direct impact on operational efficiency and customer satisfaction. Additionally, the scalability and adaptability of these solutions make them attractive to both large enterprises and SMEs, supporting widespread market penetration.



    The ongoing advancements in quantum hardware and software ecosystems are also contributing to the market’s expansion. Major technology players and startups alike are investing in the development of quantum processors, quantum-safe algorithms, and hybrid quantum-classical architectures tailored for search applications. As quantum computing becomes more accessible through cloud-based platforms, organizations of all sizes can leverage its power without the need for significant upfront infrastructure investments. This democratization of quantum technology is expected to accelerate adoption rates, drive innovation in search engine design, and lower barriers to entry for new market participants. Furthermore, collaborative efforts between academia, industry, and government agencies are fostering a vibrant ecosystem that supports research, standardization, and commercialization of quantum-enhanced neural search solutions.



    From a regional perspective, North America currently leads the quantum-enhanced neural search engine market, accounting for the largest share in 2024, primarily due to its advanced technological infrastructure, significant R&D investments, and early adoption by key industry players. Europe follows closely, supported by robust governmental initiatives and a strong presence of quantum research institutions. The Asia Pacific region is witnessing the fastest growth, driven by increasing digitalization, expanding tech startups, and supportive regulatory frameworks, particularly in countries like China, Japan, and South Korea. Latin America and the Middle East & Africa are also emerging as promising markets, with growing interest in quantum technologies and AI-driven solutions to address local industry challenges. Each region presents unique opportunities and challenges, shaping the competitive landscape and influencing market dynamics over the forecast period.



  14. Histione HCD and CID datasets for evaluation of search engines

    • data.niaid.nih.gov
    • ebi.ac.uk
    xml
    Updated Sep 1, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zuo-Fei Yuan; Benjamin A. Garcia (2014). Histione HCD and CID datasets for evaluation of search engines [Dataset]. https://data.niaid.nih.gov/resources?id=pxd001118
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Sep 1, 2014
    Dataset provided by
    Epigenetics Program, Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania
    UPenn
    Authors
    Zuo-Fei Yuan; Benjamin A. Garcia
    Variables measured
    Proteomics
    Description

    To identify confident spectra from histone peptides containing PTMs, we present a method in which one kind of modification is searched each time. We then combine the identifications of multiple search engines to obtain confident results. We find that two search engines, pFind and Mascot, identify most of the confident results. This study will be beneficial those who are interested in histone proteomics analysis.

  15. f

    Data from: Comparative Evaluation of Proteome Discoverer and FragPipe for...

    • acs.figshare.com
    zip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tianen He; Youqi Liu; Yan Zhou; Lu Li; He Wang; Shanjun Chen; Jinlong Gao; Wenhao Jiang; Yi Yu; Weigang Ge; Hui-Yin Chang; Ziquan Fan; Alexey I. Nesvizhskii; Tiannan Guo; Yaoting Sun (2023). Comparative Evaluation of Proteome Discoverer and FragPipe for the TMT-Based Proteome Quantification [Dataset]. http://doi.org/10.1021/acs.jproteome.2c00390.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    ACS Publications
    Authors
    Tianen He; Youqi Liu; Yan Zhou; Lu Li; He Wang; Shanjun Chen; Jinlong Gao; Wenhao Jiang; Yi Yu; Weigang Ge; Hui-Yin Chang; Ziquan Fan; Alexey I. Nesvizhskii; Tiannan Guo; Yaoting Sun
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Isobaric labeling-based proteomics is widely applied in deep proteome quantification. Among the platforms for isobaric labeled proteomic data analysis, the commercial software Proteome Discoverer (PD) is widely used, incorporating the search engine CHIMERYS, while FragPipe (FP) is relatively new, free for noncommercial purposes, and integrates the engine MSFragger. Here, we compared PD and FP over three public proteomic data sets labeled using 6plex, 10plex, and 16plex tandem mass tags. Our results showed the protein abundances generated by the two software are highly correlated. PD quantified more proteins (10.02%, 15.44%, 8.19%) than FP with comparable NA ratios (0.00% vs. 0.00%, 0.85% vs. 0.38%, and 11.74% vs. 10.52%) in the three data sets. Using the 16plex data set, PD and FP outputs showed high consistency in quantifying technical replicates, batch effects, and functional enrichment in differentially expressed proteins. However, FP saved 93.93%, 96.65%, and 96.41% of processing time compared to PD for analyzing the three data sets, respectively. In conclusion, while PD is a well-maintained commercial software integrating various additional functions and can quantify more proteins, FP is freely available and achieves similar output with a shorter computational time. Our results will guide users in choosing the most suitable quantification software for their needs.

  16. P

    MSLR WEB30K Dataset

    • paperswithcode.com
    Updated Apr 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tao Qin; Tie-Yan Liu (2025). MSLR WEB30K Dataset [Dataset]. https://paperswithcode.com/dataset/mslr-web30k
    Explore at:
    Dataset updated
    Apr 14, 2025
    Authors
    Tao Qin; Tie-Yan Liu
    Description

    The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment labels:

    (1) The relevance judgments are obtained from a retired labeling set of a commercial web search engine (Microsoft Bing), which take 5 values from 0 (irrelevant) to 4 (perfectly relevant).

    (2) The features are basically extracted by us, and are those widely used in the research community.

    In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.

  17. World - Twitter Sentiment By Country

    • kaggle.com
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Jiang (2020). World - Twitter Sentiment By Country [Dataset]. https://www.kaggle.com/wjia26/twittersentimentbycountry/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    William Jiang
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Area covered
    World
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1041505%2F0625876b77e55a56422bb5a37d881e0d%2Fawdasdw.jpg?generation=1595666545033847&alt=media" alt="">

    Introduction

    Ever wondered what people are saying about certain countries? Whether it's in a positive/negative light? What are the most commonly used phrases/words to describe the country? In this dataset I present tweets where a certain country gets mentioned in the hashtags (e.g. #HongKong, #NewZealand). It contains around 150 countries in the world. I've added an additional field called polarity which has the sentiment computed from the text field. Feel free to explore! Feedback is much appreciated!

    Content

    Each row represents a tweet. Creation Dates of Tweets Range from 12/07/2020 to 25/07/2020. Will update on a Monthly cadence. - The Country can be derived from the file_name field. (this field is very Tableau friendly when it comes to plotting maps) - The Date at which the tweet was created can be got from created_at field. - The Search Query used to query the Twitter Search Engine can be got from search_query field. - The Tweet Full Text can be got from the text field. - The Sentiment can be got from polarity field. (I've used the Vader Model from NLTK to compute this.)

    Notes

    There maybe slight duplications in tweet id's before 22/07/2020. I have since fixed this bug.

    Acknowledgements

    Thanks to the tweepy package for making the data extraction via Twitter API so easy.

    Shameless Plug

    Feel free to checkout my blog if you want to learn how I built the datalake via AWS or for other data shenanigans.

    Here's an App I built using a live version of this data.

  18. o

    TyDi QA Extension Dataset

    • opendatabay.com
    .undefined
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). TyDi QA Extension Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/e343c299-31e3-47c9-9184-55b559067151
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Knowledge Bundles
    Description

    This dataset, known as Answerable-TyDiQA, is an extension of the GoldP subtask from the original TyDi QA. It serves as a valuable resource for training artificial intelligence models for natural language processing tasks. The collection comprises an extensive array of question-answer pairs, meticulously extracted from the Tashkeela Giclée Web Corpus. It offers researchers, developers, and data scientists a rich set of real-world scenarios for exploration in areas like language engineering and AI research.

    Columns

    The dataset typically includes the following columns:

    • question_text: This column contains the actual text of the questions asked. (String)
    • document_title: This column provides the title of the document associated with each question. (String)
    • language: This column indicates the language in which the question is posed. (String)
    • annotations: This column holds annotations pertinent to the question. (String)
    • document_plaintext: This column includes the plain text content of the document linked to the question. (String)
    • document_url: This column provides the URL of the document associated with the question. (String)

    Distribution

    The dataset is typically provided in CSV format, with a file such as train.csv containing the question-answer pairs. While specific total row counts are not explicitly stated, the dataset features a substantial number of unique values across its columns. For instance, there are 57,645 unique plain text documents and 35,185 unique document titles. The language distribution includes Arabic at approximately 26%, Finnish at 12%, and other languages collectively making up about 63% of the dataset.

    Usage

    This dataset is ideal for various applications, including:

    • AI-based question answering systems: It can be used to train and test AI models to understand question formatting, language usage, and potential answer identification.
    • Natural language processing research: Researchers can leverage this data to identify language usage trends and extract insights for developing advanced applications such as sentiment analysis or machine translation.
    • Search engine optimisation (SEO): Businesses can utilise the dataset to craft content based on commonly asked questions and answers, potentially improving their organic search engine rankings.

    Coverage

    The dataset has a global scope, drawing from a variety of linguistic sources. No specific time range or demographic scope is provided, but its multi-language nature suggests broad applicability.

    License

    CC0

    Who Can Use It

    The dataset is primarily intended for:

    • AI researchers: For exploring and gaining insights into AI language understanding.
    • Language engineers: For developing and refining language-related technologies.
    • NLP enthusiasts: For experimenting with question answering, information extraction, and text summarisation tasks.

    Dataset Name Suggestions

    • Multi-Lingual QA Pairs
    • Answerable NLP Corpus
    • Global Question-Answer Dataset
    • TyDi QA Extension Dataset
    • Web Corpus Q&A

    Attributes

    Original Data Source: TyDi QA (Questions & Answers in 11 Languages)

  19. f

    Description of the real-world dataset.

    • plos.figshare.com
    xls
    Updated Jun 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fadi K. Dib; Peter Rodgers (2023). Description of the real-world dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0287744.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 27, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Fadi K. Dib; Peter Rodgers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Graph drawing, involving the automatic layout of graphs, is vital for clear data visualization and interpretation but poses challenges due to the optimization of a multi-metric objective function, an area where current search-based methods seek improvement. In this paper, we investigate the performance of Jaya algorithm for automatic graph layout with straight lines. Jaya algorithm has not been previously used in the field of graph drawing. Unlike most population-based methods, Jaya algorithm is a parameter-less algorithm in that it requires no algorithm-specific control parameters and only population size and number of iterations need to be specified, which makes it easy for researchers to apply in the field. To improve Jaya algorithm’s performance, we applied Latin Hypercube Sampling to initialize the population of individuals so that they widely cover the search space. We developed a visualization tool that simplifies the integration of search methods, allowing for easy performance testing of algorithms on graphs with weighted aesthetic metrics. We benchmarked the Jaya algorithm and its enhanced version against Hill Climbing and Simulated Annealing, commonly used graph-drawing search algorithms which have a limited number of parameters, to demonstrate Jaya algorithm’s effectiveness in the field. We conducted experiments on synthetic datasets with varying numbers of nodes and edges using the Erdős–Rényi model and real-world graph datasets and evaluated the quality of the generated layouts, and the performance of the methods based on number of function evaluations. We also conducted a scalability experiment on Jaya algorithm to evaluate its ability to handle large-scale graphs. Our results showed that Jaya algorithm significantly outperforms Hill Climbing and Simulated Annealing in terms of the quality of the generated graph layouts and the speed at which the layouts were produced. Using improved population sampling generated better layouts compared to the original Jaya algorithm using the same number of function evaluations. Moreover, Jaya algorithm was able to draw layouts for graphs with 500 nodes in a reasonable time.

  20. f

    Data_Sheet_1_Genetic Privacy and Data Protection: A Review of Chinese...

    • frontiersin.figshare.com
    • figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Du; Meng Wang (2023). Data_Sheet_1_Genetic Privacy and Data Protection: A Review of Chinese Direct-to-Consumer Genetic Test Services.PDF [Dataset]. http://doi.org/10.3389/fgene.2020.00416.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Li Du; Meng Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe existing literature has not examined how Chinese direct-to-consumer (DTC) genetic testing providers navigate the issues of informed consent, privacy, and data protection associated with testing services. This research aims to explore these questions by examining the relevant documents and messages published on websites of the Chinese DTC genetic test providers.MethodsUsing Baidu.com, the most popular Chinese search engine, we compiled the websites of providers who offer genetic testing services and analyzed available documents related to informed consent, the terms of services, and the privacy policy. The analyses were guided by the following inquiries as they applied to each DTC provider: the methods available for purchasing testing products; the methods providers used to obtain informed consent; privacy issues and measures for protecting consumers’ health information; the policy for third-party data sharing; consumers right to their data; and the liabilities in the event of a data breach.Results68.7% of providers offer multiple channels for purchasing genetic testing products, and that social media has become a popular platform to promote testing services. Informed consent forms are not available on 94% of providers’ websites and a privacy policy is only offered by 45.8% of DTC genetic testing providers. Thirty-nine providers stated that they used measures to protect consumers’ information, of which, 29 providers have distinguished consumers’ general personal information from their genetic information. In 33.7% of the cases examined, providers stated that with consumers’ explicit permission, they could reuse and share the clients’ information for non-commercial purposes. Twenty-three providers granted consumer rights to their health information, with the most frequently mentioned right being the consumers’ right to decide how their data can be used by providers. Lastly, 21.7% of providers clearly stated their liabilities in the event of a data breach, placing more emphasis on the providers’ exemption from any liability.ConclusionsCurrently, the Chinese DTC genetic testing business is running in a regulatory vacuum, governed by self-regulation. The government should develop a comprehensive legal framework to regulate DTC genetic testing offerings. Regulatory improvements should be made based on periodical reviews of the supervisory strategy to meet the rapid development of the DTC genetic testing industry.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2021). Evolution of Web search engine interfaces through SERP screenshots and HTML complete pages for 20 years - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2021-003

Evolution of Web search engine interfaces through SERP screenshots and HTML complete pages for 20 years - Dataset - CKAN

Explore at:
Dataset updated
Jul 26, 2021
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This dataset was extracted for a study on the evolution of Web search engine interfaces since their appearance. The well-known list of “10 blue links” has evolved into richer interfaces, often personalized to the search query, the user, and other aspects. We used the most searched queries by year to extract a representative sample of SERP from the Internet Archive. The Internet Archive has been keeping snapshots and the respective HTML version of webpages over time and tts collection contains more than 50 billion webpages. We used Python and Selenium Webdriver, for browser automation, to visit each capture online, check if the capture is valid, save the HTML version, and generate a full screenshot. The dataset contains all the extracted captures. Each capture is represented by a screenshot, an HTML file, and a files' folder. We concatenate the initial of the search engine (G) with the capture's timestamp for file naming. The filename ends with a sequential integer "-N" if the timestamp is repeated. For example, "G20070330145203-1" identifies a second capture from Google by March 30, 2007. The first is identified by "G20070330145203". Using this dataset, we analyzed how SERP evolved in terms of content, layout, design (e.g., color scheme, text styling, graphics), navigation, and file size. We have registered the appearance of SERP features and analyzed the design patterns involved in each SERP component. We found that the number of elements in SERP has been rising over the years, demanding a more extensive interface area and larger files. This systematic analysis portrays evolution trends in search engine user interfaces and, more generally, web design. We expect this work will trigger other, more specific studies that can take advantage of the dataset we provide here. This graphic represents the diversity of captures by year and search engine (Google and Bing).

Search
Clear search
Close search
Google apps
Main menu