34 datasets found
  1. Relato Business Graph Database

    • kaggle.com
    zip
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Relato Business Graph Database [Dataset]. https://www.kaggle.com/datasets/thedevastator/relato-business-network-graph-373663-domain-conn/code
    Explore at:
    zip(17698362 bytes)Available download formats
    Dataset updated
    Jan 15, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Relato Business Graph Database

    Visualizing Company Relationships & Market Trends

    By Russell Jurney [source]

    About this dataset

    This dataset contains 373,663 links between businesses pulled from the web, providing an insightful reflection of intercompany relationships. It is sourced from a graph database hosted by the startup Relato, the origins of which stem from a Turk system in which users would log partnerships, example customers and more onto an autocomplete form over a database.

    These edges are defined primarily using domain names as unique IDs - allowing for information to be collated without succumbing to the extremely complex problem of entity resolution of companies. This data was used to build lead generation systems and market visualization systems – enabling powerful insights into patterns emerging from business relations.

    This largely unstructured dataset gives rise to many questions: How does the business graph operate? How do companies relate to one another? What other problems can this dataset be used for and how else can it be extended? By exploring such questions with this set we can enrich our understanding of corporate connections and discover potential opportunities for further research and marketing efforts.

    Get involved with this project: Contribute new edges; add metadata about companies; or analyze this substantial source material alone or in conjunction with various other public datasets!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is a great resource for researchers and practitioners interested in understanding the business relationships between companies. The dataset contains 373,663 links between companies, including information about the type of link, time it was updated, and domains of the two companies. This can be used to identify new potential business partners or competitors by understanding connections within industries or networks of customers.

    To get started with this dataset, explore the various columns provided in this data set such as update_time (the timestamp when an entry was last update), domain hostname where a relationship was found for one company during mining), username (user who last updated entry), home_name (name of the home company mentioned in link information) , link _name ( name of linked company mentioned in link information) , type (type o relationship between home and linked company like partnership etc.), home_domain(domain examined where relationship was found for onee compmany during mining) ,link domain( domain deriving from second organisation). You can also browse through any online visualization tools such as Gephi to understand connection patterns from this dataset . Say you want to explore connections within a certain industry-you could parse out entries for that industry by filtering columns. For example if you wanted to understand how tech companies are connected - use that column values like ' Technology ', 'Telecoms'andrelated words in all columns to parsethrough data entries efficiently . Once you have identified those entries then leverage details about their partners/customers/potential investors from linkage points across different firms .
    Using graph analysis on your subset looking at most prefered connection points amongst firms might enable identification into potential areas or marketsfor pursuing collaborations within your sector or even spotting upcoming trends & risk around interdependencies among sectors over time etc. Further breakdown consistency in time updates might allow you track shifting dynamicsof relationshpis form 1 quarter to another etc.. Finally once completed never forget double check whether these observed patterns have enough evidence since many times realtion ship data gathered may lack accuracy due its collection manually over web invormation which sometimes is not timely kept up too date hence always make sure to be aware of errors incurred common;y due extractions form internet sources especially whenits manually done

    Hopefully these tips would help provide starting point for exploring such graphs! Have fun data hunting!

    Research Ideas

    • Building an AI/Machine learning powered sales lead manager. By analyzing the graph, this tool could provide insights about a company's potential customers, partners, competitors and suppliers.
    • Creating a visualization tool for marketers to understand their markets better by visualizing the connections between companies in different industries and sectors.
    • Creating an interactive web-based search engine that allows users to quickly ...
  2. D

    Search Engineing Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Search Engineing Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/search-engine-marketing-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2025 - 2034
    Area covered
    Global
    Description

    Search Engine Market Outlook



    The search engine market size was valued at approximately USD 124 billion in 2023 and is projected to reach USD 258 billion by 2032, witnessing a robust CAGR of 8.5% during the forecast period. This growth is largely attributed to the increasing reliance on digital platforms and the internet across various sectors, which has necessitated the use of search engines for data retrieval and information dissemination. With the proliferation of smartphones and the expansion of internet access globally, search engines have become indispensable tools for both businesses and consumers, driving the market's upward trajectory. The integration of artificial intelligence and machine learning technologies into search engines is transforming the way search engines operate, offering more personalized and efficient search results, thereby further propelling market growth.



    One of the primary growth factors in the search engine market is the ever-increasing digitalization across industries. As businesses continue to transition from traditional modes of operation to digital platforms, the need for search engines to navigate and manage data becomes paramount. This shift is particularly evident in industries such as retail, BFSI, and healthcare, where vast amounts of data are generated and require efficient management and retrieval systems. The integration of AI and machine learning into search engine algorithms has enhanced their ability to process and interpret large datasets, thereby improving the accuracy and relevance of search results. This technological advancement not only improves user experience but also enhances the competitive edge of businesses, further fueling market growth.



    Another significant growth factor is the expanding e-commerce sector, which relies heavily on search engines to connect consumers with products and services. With the rise of e-commerce giants and online marketplaces, consumers are increasingly using search engines to find the best prices, reviews, and availability of products, leading to a surge in search engine usage. Additionally, the implementation of voice search technology and the growing popularity of smart home devices have introduced new dynamics to search engine functionality. Consumers are now able to conduct searches verbally, which has necessitated the adaptation of search engines to incorporate natural language processing capabilities, further driving market growth.



    The advertising and marketing sectors are also contributing significantly to the growth of the search engine market. Businesses are leveraging search engines as a primary tool for online advertising, given their wide reach and ability to target specific audiences. Pay-per-click advertising and search engine optimization strategies have become integral components of digital marketing campaigns, enabling businesses to enhance their visibility and engagement with potential customers. The measurable nature of these advertising techniques allows businesses to assess the effectiveness of their campaigns and make data-driven decisions, thereby increasing their reliance on search engines and contributing to overall market growth.



    The evolution of search engines is closely tied to the development of Ai Enterprise Search, which is revolutionizing how businesses access and utilize information. Ai Enterprise Search leverages artificial intelligence to provide more accurate and contextually relevant search results, making it an invaluable tool for organizations that manage large volumes of data. By understanding user intent and learning from past interactions, Ai Enterprise Search systems can deliver personalized experiences that enhance productivity and decision-making. This capability is particularly beneficial in sectors such as finance and healthcare, where quick access to precise information is crucial. As businesses continue to digitize and data volumes grow, the demand for Ai Enterprise Search solutions is expected to increase, further driving the growth of the search engine market.



    Regionally, North America holds a significant share of the search engine market, driven by the presence of major technology companies and a well-established digital infrastructure. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. This growth can be attributed to the rapid digital transformation in emerging economies such as China and India, where increasing internet penetration and smartphone adoption are driving demand for search engines. Additionally, government initiatives to

  3. Z

    Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

    • data.niaid.nih.gov
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haak, Fabian; Schaer, Philipp (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
    Explore at:
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Technische Hochschule Köln
    Authors
    Haak, Fabian; Schaer, Philipp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

    Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

    Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

    The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

    To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

    Dataset 2: Search Query Suggestions (suggestions.csv)

    The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

    The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

    We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

    AllSides Scraper

    At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

    We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.

  4. Fake Activity Market

    • kaggle.com
    zip
    Updated Mar 24, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxim Kolomeets (2026). Fake Activity Market [Dataset]. https://www.kaggle.com/datasets/guardeec/fake-activity-market
    Explore at:
    zip(718731 bytes)Available download formats
    Dataset updated
    Mar 24, 2026
    Authors
    Maxim Kolomeets
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset captures websites that sell fake social-media engagement (fake likes, comments, reviews, etc.). It now includes two scans (2025 and 2026), enabling longitudinal analysis of the fake activity market.

    The first scan (2025) contains 881 Fake Activity Shops (FAS), while the second scan (2026) contains 793 FAS. These are derived from large-scale automated crawling of search engines and represent validated storefronts selling fake engagement.

    The dataset is based on tens of thousands of extracted offers, aggregated into domain-level median “platform×action” unit prices (USD/unit). platform - Instagram, Facebook, etc. action - like, comment, etc. Each row represents one domain/snapshot and includes additional operational and technical signals from downloaded HTML and passive enrichment (site structure, payment mentions, automation/AI claims, technology stack, and lightweight security/reputation indicators).

    Unit: FAS domain
    Scans: 2025 and 2026

    Country coverage: 28 European countries — Austria, Belgium, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, United Kingdom

    Search engine coverage: 3 search engines — Google, Bing, DuckDuckGo

    FAS domains:
    - 2025 scan: 881
    - 2026 scan: 793

    Platforms: 60 unique platforms

    View interactive map based on dataset https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F891535%2Fbefad8b2eb985ae384eb97f81f8d9094%2Fdiscoverability.png?generation=1764866522699409&alt=media" alt="">

    Important notes

    • Measurement dataset: Automated discovery/classification/extraction can produce false positives/negatives; treat fields as measurement indicators, not proof of illegality, intent, or liability.
    • Observed discoverability: If a shop does not appear in a country’s SERP signals, this means not observed under our crawl setup, not necessarily that it does not serve that country.
    • Public-release identifiers: Raw domains/URLs are replaced by a stable pseudonymous identifier (site_id); the domain↔ID mapping is not released in the public file.

    Sensitive Fields Anonymization

    • Registrant emails removed: whois_owner_email is dropped. Instead, the dataset provides whois_owner_email_domain, whois_owner_email_hash (deterministic hash of local part with salt), and whois_owner_email_is_freemail (True/False).
    • Owner organization pseudonymization: if whois_owner_org likely refers to a natural person/sole trader, the value is replaced with REDACTED_PERSON and whois_owner_org_hash is added; legal entities are kept and flagged via whois_owner_org_is_legal_entity=True.
    • Reverse DNS: dynamic/consumer PTR patterns are cleared; reverse_dns_redacted indicates redaction.
    • Lookyloo capture IDs removed: lookyloo_capture_uuid is not included.

    Requesting access to non-anonymised data

    A non-anonymised version of the dataset (e.g., raw domains/URLs and other sensitive identifiers) may be shared with qualified researchers and policymakers under appropriate safeguards. To request access, please email Maksim.Kalameyets@newcastle.ac.uk and include: - your name and affiliation - a brief description of the purpose and why non-anonymised fields are required

    FAMOUS project

    This dataset is part of the FAMOUS Project — Fake Activity Market Observation System of Unethical Services. FAMOUS investigates the online economy of Fake Activity Shops (FAS): commercial websites selling fake social-media engagement, such as followers, likes, comments, views, and other inauthentic interactions.

    Project Goals:

    • Map and monitor Fake Activity Shops across Europe
    • Analyse market structures, pricing, strategies, and levels of sophistication
    • Support regulators, researchers, platforms, and policy experts
    • Provide datasets, APIs, visualisations, benchmarks, and methodological guidelines
    • Increase public awareness and trust in digital ecosystems

    Funding

    This work is supported by the European Media and Information Fund (EMIF), which funds research improving media integrity, combating misinformation, and strengthening trust in Europe’s digital ecosystem.

  5. u

    Data from: Inventory of online public databases and repositories holding...

    • agdatacommons.nal.usda.gov
    • data.wu.ac.at
    txt
    Updated Feb 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erin Antognoli; Jonathan Sears; Cynthia Parr (2024). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. http://doi.org/10.15482/USDA.ADC/1389839
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Ag Data Commons
    Authors
    Erin Antognoli; Jonathan Sears; Cynthia Parr
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to

    establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data

    Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered.
    Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review:

    Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection.
    Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation.

    See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

  6. Adzuna online job listings 2016-2024

    • dtechtive.com
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adzuna (2024). Adzuna online job listings 2016-2024 [Dataset]. https://dtechtive.com/datasets/45868
    Explore at:
    Dataset updated
    Jul 25, 2024
    Dataset provided by
    Adzunahttp://adzuna.co.uk/
    Area covered
    United Kingdom
    Description

    This dataset contains approximately 350 million job adverts from weekly snapshots of Adzuna, the UK's most popular vacancy search engine. Adzuna searches thousands of websites to bring together information on millions of advertisements on their service. The dataset consists of full point-in-time snapshots with details of all advertisements which were on adzuna.co.uk. This data features derived variables on job location, industry and occupation classification, salary, seniority, and skills extracted from each advert. It also includes the raw text description of each advert, offering rich insights into a range of economic and labour market issues. ##Access and restrictions## * Please note that UBDC can support a maximum of 9 research projects per year. We make periodic calls inviting users to apply in order to allocate these in a fair and transparent manner. Unlike most other data collections, we do not provide access on a responsive basis. In general, each user will have two years to complete their research project. * The dataset is available for non-commercial academic research projects only. Academic staff and PhD students from UK universities may apply. * Commercial enterprise or non-research functions within any academic institution (e.g. careers services) will not be considered for access to the data. ##Calls for expressions of interest are now open## To register your interest to access this data, fill in the form by clicking the "request data" button at the top of the page. Completed Adzuna proposal application forms must be sent to UBDC by the deadline of Monday 29th July 2024. ##Eligibility## Applications are encouraged from researchers looking to undertake academic projects with a strong potential for scientific and public benefit impact. Many of these are likely to be in fields connected with the labour market and the economy but applications from any field will be considered. Methodological studies are eligible as well as those with a substantive focus. Applicants may propose combining Adzuna data with other sources provided access to all data sources can be confirmed before the Adzuna licence is agreed. Projects are expected to be up to 2 years duration, this includes all time for completing published outputs, in most cases although UBDC may be able to support longer projects. Academic staff and PhD students from UK universities may apply. The licences permit use of the data by named researchers for research purposes but not for teaching. ##Assessment Criteria## UBDC has a limited number of licenses so the process of allocating these will be a competitive one. The criteria against which proposals will be assessed are as follows: 1. The potential for original scientific contribution, substantive or methodological in any field or discipline; 2. The potential for wider impacts on society, economy or environment; 3. The willingness of the research team to participate in activities to build a community of researchers using these data, add to knowledge about the data and share resources including code. UBDC may also select projects so as to achieve some balance in terms of the subjects, fields or disciplines covered, and the career stage of the researchers. Successful projects will be listed on UBDC's website. ##Application Procedure## The application procedure has two stages: 1. Interested applicants should complete the project summary form with information about the researchers, a project outline, data requirements, proposed outputs and outcomes, and timescales. UBDC will email the full Adzuna project proposal form to eligible applicants and they will be required to complete and return the full proposal form by the deadline of Monday 29 July 2024. 2. UBDC staff will verify that the usage will be in accordance with the terms and conditions of the licence, and that the spatial and temporal coverage of the dataset is likely to meet the project requirements. Depending on the number of applications, UBDC research staff may conduct an initial assessment against the criteria noted above. Project proposal forms will be sent to shortlisted researchers. The deadline for submission of the project proposal form is 29 July 2024. 3. Depending on demand in this round, UBDC may issue a second call for expressions of interest later this year. ##Selection Procedure## UBDC staff, including senior researchers, will review the proposal forms at first and second stages and agree the selection. All applicants will be informed of the outcome of their application. UBDC's decision will be final and not subject to appeal. Successful applicants will be required to complete the UBDC Access Agreement and an End User Licence Agreement to ensure compliance with terms and conditions for the dataset before the data is shared. ##Transfer of data## On receipt of the signed licence agreements, UBDC Data Service will transfer the files securely to the applicant. Data will require to be stored on password-protected University systems. Access must be limited to those named on the proposal. The Data Service team will provide ongoing support to ensure applicants can download and access files, understand and utilise the data. The Data Service team will be in touch with the researcher to request information on progress, as well as outcomes and outputs enabled by utilising the data, including policy papers, publications and presentations at conferences. UBDC encourages sharing of experiences of using data with the wider research community and there will be future webinars scheduled to facilitate knowledge-sharing. We look forward to receiving your applications via the Project Summary Form

  7. The State of Serverless Applications: Collection,Characterization, and...

    • zenodo.org
    zip
    Updated Aug 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Eismann; Joel Scheuner; Erwin van Eyk; Maximilian Schwinger; Johannes Grohmann; NIkolas Herbst; Cristina Abad; Simon Eismann; Joel Scheuner; Erwin van Eyk; Maximilian Schwinger; Johannes Grohmann; NIkolas Herbst; Cristina Abad (2021). The State of Serverless Applications: Collection,Characterization, and Community Consensus - Replication Package [Dataset]. http://doi.org/10.5281/zenodo.5185055
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 12, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Simon Eismann; Joel Scheuner; Erwin van Eyk; Maximilian Schwinger; Johannes Grohmann; NIkolas Herbst; Cristina Abad; Simon Eismann; Joel Scheuner; Erwin van Eyk; Maximilian Schwinger; Johannes Grohmann; NIkolas Herbst; Cristina Abad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The replication package for our article The State of Serverless Applications: Collection,Characterization, and Community Consensus provides everything required to reproduce all results for the following three studies:

    • Serverless Application Collection
    • Serverless Application Characterization
    • Comparison Study

    Serverless Application Collection

    We collect descriptions of serverless applications from open-source projects, academic literature, industrial literature, and scientific computing.

    Open-source Applications

    As a starting point, we used an existing data set on open-source serverless projects from this study. We removed small and inactive projects based on the number of files, commits, contributors, and watchers. Next, we manually filtered the resulting data set to include only projects that implement serverless applications. We provide a table containing all projects that remained after the filtering alongside the notes from the manual filtering.

    Academic Literature Applications

    We based our search on an existing community-curated dataset on literature for serverless computing consisting of over 180 peer-reviewed articles. First, we filtered the articles based on title and abstract. In a second iteration, we filtered out any articles that implement only a single function for evaluation purposes or do not include sufficient detail to enable a review. As the authors were familiar with some additional publications describing serverless applications, we contributed them to the community-curated dataset and included them in this study. We provide a table with our notes from the manual filtering.

    Scientific Computing Applications

    Most of these scientific computing serverless applications are still at an early stage and therefore there is little public data available. One of the authors is employed at the German Aerospace Center (DLR) at the time of writing, which allowed us to collect information about several projects at DLR that are either currently moving to serverless solutions or are planning to do so. Additionally, an application from the German Electron Synchrotron (DESY) could be included. For each of these scientific computing applications, we provide a document containing a description of the project and the names of our contacts that provided information for the characterization of these applications.

    • SC1 Copernicus Sentinel-1 for near-real-time water monitoring
    • SC2 Reprocessing Sentinel 5 Precursor data with ProEO
    • SC3 High-Performance Data Analytics for Earth Observation
    • SC4 Tandem-L exploitation platform
    • SC5 Global Urban Footprint
    • SC6 DESY - High Throughput Data Taking

    Collection of serverless applications

    Based on the previously described methodology, we collected a diverse dataset of 89 serverless applications from open-source projects, academic literature, industrial literature, and scientific computing. This dataset is can be found in Dataset.xlsx.

    Serverless Application Characterization

    As previously described, we collected 89 serverless applications from four different sources. Subsequently, two randomly assigned reviewers out of seven available reviewers characterized each application along 22 characteristics in a structured collaborative review sheet. The characteristics and potential values were defined a priori by the authors and iteratively refined, extended, and generalized during the review process. The initial moderate inter-rater agreement was followed by a discussion and consolidation phase, where all differences between the two reviewers were discussed and resolved. The six scientific applications were not publicly available and therefore characterized by a single domain expert, who is either involved in the development of the applications or in direct contact with the development team.

    Initial Ratings & Interrater Agreement Calculation

    The initial reviews are available as a table, where every application is characterized along with the 22 characteristics. A single value indicates that both reviewers assigned the same value, whereas a value of the form [Reviewer 2] A | [Reviewer 4] B indicates that for this characteristic, reviewer two assigned the value A, whereas reviewer assigned the value B.

    Our script for the calculation of the Fleiß-Kappa score based on this data is also publically available. It requires the python package pandas and statsmodels. It does not require any input and assumes that the file Initial Characterizations.csv is located in the same folder. It can be executed as follows:

    python3 CalculateKappa.py
    

    Results Including Unknown Data

    In the following discussion and consolidation phase, the reviewers compared their notes and tried to reach a consensus for the characteristics with conflicting assignments. In a few cases, the two reviewers had different interpretations of a characteristic. These conflicts were discussed among all authors to ensure that characteristic interpretations were consistent. However, for most conflicts, the consolidation was a quick process as the most frequent type of conflict was that one reviewer found additional documentation that the other reviewer did not find.

    For six characteristics, many applications were assigned the ''Unknown'' value, i.e., the reviewers were not able to determine the value of this characteristic. Therefore, we excluded these characteristics from this study. For the remaining characteristics, the percentage of ''Unknowns'' ranges from 0–19% with two outliers at 25% and 30%. These ''Unknowns'' were excluded from the percentage values presented in the article. As part of our replication package, we provide the raw results for each characteristic including the ''Unknown'' percentages in the form of bar charts.

    The script for the generation of these bar charts is also part of this replication package). It uses the python packages pandas, numpy, and matplotlib. It does not require any input and assumes that the file Dataset.csv is located in the same folder. It can be executed as follows:

    python3 GenerateResultsIncludingUnknown.py
    

    Final Dataset & Figure Generation

    In the following discussion and consolidation phase, the reviewers compared their notes and tried to reach a consensus for the characteristics with conflicting assignments. In a few cases, the two reviewers had different interpretations of a characteristic. These conflicts were discussed among all authors to ensure that characteristic interpretations were consistent. However, for most conflicts, the consolidation was a quick process as the most frequent type of conflict was that one reviewer found additional documentation that the other reviewer did not find. Following this process, we were able to resolve all conflicts, resulting in a collection of 89 applications described by 18 characteristics. This dataset is available here: link

    The script to generate all figures shown in the chapter "Serverless Application Characterization can be found here. It does not require any input but assumes that the file Dataset.csv is located in the same folder. It uses the python packages pandas, numpy, and matplotlib. It can be executed as follows:

    python3 GenerateFigures.py
    

    Comparison Study

    To identify existing surveys and datasets that also investigate one of our characteristics, we conducted a literature search using Google as our search engine, as we were mostly looking for grey literature. We used the following search term:

    ("serverless" OR "faas") AND ("dataset" OR "survey" OR "report") after: 2018-01-01
    

    This search term looks for any combination of either serverless or faas alongside any of the terms dataset, survey, or report. We further limited the search to any articles after 2017, as serverless is a fast-moving field and therefore any older studies are likely outdated already. This search term resulted in a total of 173 search results. In order to validate if using only a single search engine is sufficient, and if the search term is broad enough, we

  8. RuBQ 2.0

    • kaggle.com
    zip
    Updated Aug 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin Biryukov (2021). RuBQ 2.0 [Dataset]. https://www.kaggle.com/valentinbiryukov/rubq-20
    Explore at:
    zip(15459314 bytes)Available download formats
    Dataset updated
    Aug 9, 2021
    Authors
    Valentin Biryukov
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    RuBQ 2.0: An Innovated Russian Question Answering Dataset

    Introduction

    We present the second version of RuBQ. The dataset extension is based on questions obtained through search engine query suggestion services. The dataset doubled in size: RuBQ 2.0 contains 2,910 questions along with the answers and SPARQL queries. Starting with limited manual data preparation, we carried out most of the annotation using crowdsourcing and automated routines. We expanded the dataset with machine reading comprehension capabilities: RuBQ 2.0 incorporates answer-bearing paragraphs from Wikipedia for the majority of questions. Thus, the dataset is now not only suitable for the evaluation of KBQA, but also can be used to evaluate machine reading comprehension, paragraph retrieval, and end-to-end open-domain question answering. The dataset can be also used for experiments in hybrid QA, where KBQA and text-based QA can enrich and complement each other.

    Links

    ESWC 2020 paper :page_facing_up:

    Paper discussion at OpenReview :page_facing_up:

    Test and Dev subsets

    Related paragraphs

    RuWikidata sample

    Dataset is also published on Zenodo

    Usage

    RuBQ 2.0 is suitable for the evaluation of KBQA, MRC, paragraph retrieval, and end-to-end open-domain question answering. The dataset is thought to be used primarily for testing rule-based systems, models based on few/zero-shot and transfer learning, as well as models trained on automatically generated examples, similarly to recent MRC datasets. One also can use RuBQ 2.0 as a development and test sets in cross-lingual transfer, few-shot learning, or learning with synthetic data scenarios.

    Format

    Data set files are presented in JSON format as an array of dictionary entries. See full specifications here.

    Examples

    Inherited from RuBQ 1.0:

    QuestionQueryAnswersTags
    Rus: Кто написал роман «Хижина дяди Тома»?

    Eng: Who wrote the novel "Uncle Tom's Cabin"?
    SELECT ?answer 
    WHERE {
    wd:Q2222 wdt:P50 ?answer .
    }
    wd:Q102513
    (Harriet Beecher Stowe)
    1-hop
    Rus: Кто сыграл князя Андрея Болконского в фильме С. Ф. Бондарчука «Война и мир»?

    Eng: Who played Prince Andrei Bolkonsky in S. F. Bondarchuk's film "War and peace"?
    SELECT ?answer
    WHERE {
    wd:Q845176 p:P161 [
    ps:P161 ?answer;
    pq:P453 wd:Q2737140
    ] .
    }
    wd:Q312483
    (Vyacheslav Tikhonov)
    qualifier-constraint
    Rus: Кто на работе пользуется теодолитом?

    Eng: Who uses a theodolite for work?
    SELECT ?answer 
    WHERE {
    wd:Q181517 wdt:P366 [
    wdt:P3095 ?answer
    ] .
    }
    wd:Q1734662
    (cartographer)
    wd:Q11699606
    (geodesist)
    wd:Q294126
    (land surveyor)
    multi-hop
    Rus: Какой океан самый маленький?

    Eng: Which ocean is the smallest?
    SELECT ?answer 
    WHERE {
    ?answer p:P2046/
    psn:P2046/
    wikibase:quantityAmount ?sq .
    ?answer wdt:P31 wd:Q9430 .
    }
    ORDER BY ASC(?sq)
    LIMIT 1
    wd:Q788
    (Arctic Ocean)
    multi-constraint

    reverse

    ranking
    Rus: Сколько дней продолжалась Курская битва?

    Eng: How many days did the battle of Kursk last?
    SELECT ?answer 
    WHERE {
    wd:Q130861 wdt:P580 ?begin .
    wd:Q130861 wdt:P582 ?end .
    BIND (xsd:integer(?end - ?begin + 1) AS ?answer).
    }
    50duration

    New in RuBQ 2.0, answer names:
    (lists of Wikidata names may be truncated)

    QuestionAnswersWD LabelWD NamesWP Names
    Rus: Кто написал роман «Хижина дяди Тома»?

    Eng: Who wrote the novel "Uncle Tom's Cabin"?
    wd:Q102513Гарриет Бичер-СтоуRu: Стоу Гарриет Бичер,
    Бичер-Стоу Гарриет,
    Гарриет Бичер-Стоу,
    Бичер Стоу,
    ...
    En: Christopher Crowfield,
    Harriet Elizabeth Beecher Stowe,
    Enrieta Elizabeth Beecher Stowe,
    Harriet Beecher Stowe
    Гарриет Бичер-Стоу
    Rus: Кто сыграл князя Андрея Болконского в фильме С. Ф. Бондарчука «Война и мир»?

    Eng: Who played Prince Andrei Bolkonsky in S. F. Bondarchuk's film "War and peace"?
    wd:Q312483Вячеслав Васильевич ТихоновRu: Тихонов, Вячеслав,
    Вячеслав Тихонов,
    Тихонов Вячеслав Васильевич,
    Вячеслав Васильевич Тихонов,
    ...
    En: Vyacheslav Tikhonov
    Вячеслав Васильевич Тихонов
    Rus: Кто на работе пользуется теодолитом?

    Eng: Who uses a theodolite for work?
    wd:Q1734662



    wd:Q11699606


    ...
  9. d

    HSIP E911 Public Safety Answering Point (PSAP)

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Dec 2, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2020). HSIP E911 Public Safety Answering Point (PSAP) [Dataset]. https://catalog.data.gov/dataset/hsip-e911-public-safety-answering-point-psap
    Explore at:
    Dataset updated
    Dec 2, 2020
    Dataset provided by
    (Point of Contact)
    Description

    911 Public Safety Answering Point (PSAP) service area boundaries in New Mexico According to the National Emergency Number Association (NENA), a Public Safety Answering Point (PSAP) is a facility equipped and staffed to receive 9-1-1 calls. The service area is the geographic area within which a 911 call placed using a landline is answered at the associated PSAP. This dataset only includes primary PSAPs. Secondary PSAPs, backup PSAPs, and wireless PSAPs have been excluded from this dataset. Primary PSAPs receive calls directly, whereas secondary PSAPs receive calls that have been transferred by a primary PSAP. Backup PSAPs provide service in cases where another PSAP is inoperable. Most military bases have their own emergency telephone systems. To connect to such system from within a military base it may be necessary to dial a number other than 9 1 1. Due to the sensitive nature of military installations, TGS did not actively research these systems. If civilian authorities in surrounding areas volunteered information about these systems or if adding a military PSAP was necessary to fill a hole in civilian provided data, TGS included it in this dataset. Otherwise military installations are depicted as being covered by one or more adjoining civilian emergency telephone systems. In some cases areas are covered by more than one PSAP boundary. In these cases, any of the applicable PSAPs may take a 911 call. Where a specific call is routed may depend on how busy the applicable PSAPS are (i.e. load balancing), operational status (i.e. redundancy), or time of date / day of week. If an area does not have 911 service, TGS included that area in the dataset along with the address and phone number of their dispatch center. These are areas where someone must dial a 7 or 10 digit number to get emergency services. These records can be identified by a "Y" in the [NON911EMNO] field. This indicates that dialing 911 inside one of these areas does not connect one with emergency services. This dataset was constructed by gathering information about PSAPs from state level officials. In some cases this was geospatial information, in others it was tabular. This information was supplemented with a list of PSAPs from the Federal Communications Commission (FCC). Each PSAP was researched to verify its tabular information. In cases where the source data was not geospatial, each PSAP was researched to determine its service area in terms of existing boundaries (e.g. city and county boundaries). In some cases existing boundaries had to be modified to reflect coverage areas (e.g. "entire county north of Country Road 30"). However, there may be cases where minor deviations from existing boundaries are not reflected in this dataset, such as the case where a particular PSAPs coverage area includes an entire county, and the homes and businesses along a road which is partly in another county. Text fields in this dataset have been set to all upper case to facilitate consistent database engine search results. All diacritics (e.g., the German umlaut or the Spanish tilde) have been replaced with their closest equivalent English character to facilitate use with database systems that may not support diacritics.

  10. m

    Ultimate Arabic News Dataset

    • data.mendeley.com
    Updated May 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Hashim Al-Dulaimi (2022). Ultimate Arabic News Dataset [Dataset]. http://doi.org/10.17632/jz56k5wxz7.1
    Explore at:
    Dataset updated
    May 9, 2022
    Authors
    Ahmed Hashim Al-Dulaimi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Ultimate Arabic News Dataset is a collection of single-label modern Arabic texts that are used in news websites and press articles.

    Arabic news data was collected by web scraping techniques from many famous news sites such as Al-Arabiya, Al-Youm Al-Sabea (Youm7), the news published on the Google search engine and other various sources.

    • The data we collect consists of two Primary files:

    UltimateArabic: A file containing more than 193,000 original Arabic news texts, without pre-processing. The texts contain words, numbers, and symbols that can be removed using pre-processing to increase accuracy when using the dataset in various Arabic natural language processing tasks such as text classification.

    UltimateArabicPrePros: It is a file that contains the data mentioned in the first file, but after pre-processing, where the number of data became about 188,000 text documents, where stop words, non-Arabic words, symbols and numbers have been removed so that this file is ready for use directly in the various Arabic natural language processing tasks. Like text classification.

    • We add two samples of data collected by web scraping techniques:

    Sample_Youm7_Politic: An example of news in the "Politic" category collected from the Youm7 website.

    Sample_alarabiya_Sport: An example of news in the "Sport" category collected from the Al-Arabiya website.

    • The data is divided into 10 different categories: Culture, Diverse, Economy, Sport, Politic, Art, Society, Technology, Medical and Religion.
  11. aol-data.tar.bz2

    • figshare.com
    bz2
    Updated Oct 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graham Cormode (2017). aol-data.tar.bz2 [Dataset]. http://doi.org/10.6084/m9.figshare.5527231.v1
    Explore at:
    bz2Available download formats
    Dataset updated
    Oct 23, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Graham Cormode
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AOL search data anonymized and released by AOL Research in 2006."500k User Session Collection----------------------------------------------This collection is distributed for NON-COMMERCIAL RESEARCH USE ONLY. Any application of this collection for commercial purposes is STRICTLY PROHIBITED.Brief description:This collection consists of ~20M web queries collected from ~650k users over three months.The data is sorted by anonymous user ID and sequentially arranged. The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research. The data set includes {AnonID, Query, QueryTime, ItemRank, ClickURL}. AnonID - an anonymous user ID number. Query - the query issued by the user, case shifted with most punctuation removed. QueryTime - the time at which the query was submitted for search. ItemRank - if the user clicked on a search result, the rank of the item on which they clicked is listed. ClickURL - if the user clicked on a search result, the domain portion of the URL in the clicked result is listed.Each line in the data represents one of two types of events: 1. A query that was NOT followed by the user clicking on a result item. 2. A click through on an item in the result list returned from a query.In the first case (query only) there is data in only the first three columns/fields -- namely AnonID, Query, and QueryTime (see above). In the second case (click through), there is data in all five columns. For click through events, the query that preceded the click through is included. Note that if a user clicked on more than one result in the list returned from a single query, there will be TWO lines in the data to represent the two events. Also note that if the user requested the next "page" or results for some query, this appears as a subsequent identical query with a later time stamp.CAVEAT EMPTOR -- SEXUALLY EXPLICIT DATA! Please be aware that these queries are not filtered to remove any content. Pornography is prevalent on the Web and unfiltered search engine logs contain queries by users who are looking for pornographic material. There are queries in this collection that use SEXUALLY EXPLICIT LANGUAGE. This collection of data is intended for use by mature adults who are not easily offended by the use of pornographic search terms. If you are offended by sexually explicit language you should not read through this data. Also be aware that in some states it may be illegal to expose a minor to this data. Please understand that the data represents REAL WORLD USERS, un-edited and randomly sampled, and that AOL is not the author of this data.Basic Collection StatisticsDates: 01 March, 2006 - 31 May, 2006Normalized queries: 36,389,567 lines of data 21,011,340 instances of new queries (w/ or w/o click-through) 7,887,022 requests for "next page" of results 19,442,629 user click-through events 16,946,938 queries w/o user click-through 10,154,742 unique (normalized) queries 657,426 unique user ID'sPlease reference the following publication when using this collection:G. Pass, A. Chowdhury, C. Torgeson, "A Picture of Search" The First International Conference on Scalable Information Systems, Hong Kong, June, 2006.Copyright (2006) AOL"

  12. BIP! DB: A Dataset of Impact Measures for Research Products

    • zenodo.org
    application/gzip
    Updated Mar 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thanasis Vergoulis; Thanasis Vergoulis; Ilias Kanellos; Ilias Kanellos; Claudio Atzori; Claudio Atzori; Andrea Mannocci; Andrea Mannocci; Serafeim Chatzopoulos; Serafeim Chatzopoulos; Sandro La Bruzzo; Sandro La Bruzzo; Natalia Manola; Paolo Manghi; Paolo Manghi; Natalia Manola (2024). BIP! DB: A Dataset of Impact Measures for Research Products [Dataset]. http://doi.org/10.5281/zenodo.10804822
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Thanasis Vergoulis; Thanasis Vergoulis; Ilias Kanellos; Ilias Kanellos; Claudio Atzori; Claudio Atzori; Andrea Mannocci; Andrea Mannocci; Serafeim Chatzopoulos; Serafeim Chatzopoulos; Sandro La Bruzzo; Sandro La Bruzzo; Natalia Manola; Paolo Manghi; Paolo Manghi; Natalia Manola
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains citation-based impact indicators (a.k.a, "measures") for ~187,8M distinct PIDs (persistent identifiers) that correspond to research products (scientific publications, datasets, etc). In particular, for each PID, we have calculated the following indicators (organized in categories based on the semantics of the impact aspect that they better capture):

    Influence indicators (i.e., indicators of the "total" impact of each research product; how established it is in general)

    Citation Count: The total number of citations of the product, the most well-known influence indicator.

    PageRank score: An influence indicator based on the PageRank [1], a popular network analysis method. PageRank estimates the influence of each product based on its centrality in the whole citation network. It alleviates some issues of the Citation Count indicator (e.g., two products with the same number of citations can have significantly different PageRank scores if the aggregated influence of the products citing them is very different - the product receiving citations from more influential products will get a larger score).

    Popularity indicators (i.e., indicators of the "current" impact of each research product; how popular the product is currently)

    RAM score: A popularity indicator based on the RAM [2] method. It is essentially a Citation Count where recent citations are considered as more important. This type of "time awareness" alleviates problems of methods like PageRank, which are biased against recently published products (new products need time to receive a number of citations that can be indicative for their impact).

    AttRank score: A popularity indicator based on the AttRank [3] method. AttRank alleviates PageRank's bias against recently published products by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher's preference to examine products which received a lot of attention recently.

    Impulse indicators (i.e., indicators of the initial momentum that the research product received right after its publication)

    Incubation Citation Count (3-year CC): This impulse indicator is a time-restricted version of the Citation Count, where the time window length is fixed for all products and the time window depends on the publication date of the product, i.e., only citations 3 years after each product's publication are counted.

    More details about the aforementioned impact indicators, the way they are calculated and their interpretation can be found here and in the respective references (e.g., in [5]).

    From version 5.1 onward, the impact indicators are calculated in two levels:

    • The PID level (assuming that each PID corresponds to a distinct research product).
    • The OpenAIRE-id level (leveraging PID synonyms based on OpenAIRE's deduplication algorithm [4] - each distinct article has its own OpenAIRE id).

    Previous versions of the dataset only provided the scores at the PID level.

    From version 12 onward, two types of PIDs are included in the dataset: DOIs and PMIDs (before that version, only DOIs were included).

    Also, from version 7 onward, for each product in our files we also offer an impact class, which informs the user about the percentile into which the product score belongs compared to the impact scores of the rest products in the database. The impact classes are: C1 (in top 0.01%), C2 (in top 0.1%), C3 (in top 1%), C4 (in top 10%), and C5 (in bottom 90%).

    Finally, before version 10, the calculation of the impact scores (and classes) was based on a citation network having one node for each product with a distinct PID that we could find in our input data sources. However, from version 10 onward, the nodes are deduplicated using the most recent version of the OpenAIRE article deduplication algorithm. This enabled a correction of the scores (more specifically, we avoid counting citation links multiple times when they are made by multiple versions of the same product). As a result, each node in the citation network we build is a deduplicated product having a distinct OpenAIRE id. We still report the scores at PID level (i.e., we assign a score to each of the versions/instances of the product), however these PID-level scores are just the scores of the respective deduplicated nodes propagated accordingly (i.e., all version of the same deduplicated product will receive the same scores). We have removed a small number of instances (having a PID) that were assigned (by error) to multiple deduplicated records in the OpenAIRE Graph.

    For each calculation level (PID / OpenAIRE-id) we provide five (5) compressed CSV files (one for each measure/score provided) where each line follows the format "identifier

    From version 9 onward, we also provide topic-specific impact classes for PID-identified products. In particular, we associated those products with 2nd level concepts from OpenAlex; we chose to keep only the three most dominant concepts for each product, based on their confidence score, and only if this score was greater than 0.3. Then, for each product and impact measure, we compute its class within its respective concepts. We provide finally the "topic_based_impact_classes.txt" file where each line follows the format "identifier

    The data used to produce the citation network on which we calculated the provided measures have been gathered from the OpenAIRE Graph v7.1.0, including data from (a) OpenCitations' COCI & POCI dataset, (b) MAG [6,7], and (c) Crossref. The union of all distinct citations that could be found in these sources have been considered. In addition, versions later than v.10 leverage the filtering rules described here to remove from the dataset PIDs with problematic metadata.

    References:

    [1] R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.

    [2] Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380

    [3] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)

    [4] P. Manghi, C. Atzori, M. De Bonis, A. Bardi, Entity deduplication in big data graphs for scholarly communication, Data Technologies and Applications (2020).

    [5] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)

    [6] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839

    [7] K. Wang et al., "A Review of Microsoft Academic Services for Science of Science Studies", Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045

    Find our Academic Search Engine built on top of these data here. Further note, that we also provide all calculated scores through BIP! Finder's API.

    Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the CC0 license.

    More details about BIP! DB can be found in our relevant peer-reviewed publication:

    Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. WWW (Companion Volume) 2021: 456-460

    We kindly request that any published research that makes use of BIP! DB cite the above article.

  13. The language of sound search: Examining User Queries in Audio Search Engines...

    • zenodo.org
    csv, zip
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Weck; Benno Weck; Frederic Font; Frederic Font (2024). The language of sound search: Examining User Queries in Audio Search Engines (supplementary materials) [Dataset]. http://doi.org/10.5281/zenodo.13622537
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benno Weck; Benno Weck; Frederic Font; Frederic Font
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset accompanies the paper titled "The Language of Sound Search: Examining User Queries in Audio Search Engines." The study investigates user-generated textual queries within the context of sound search engines, which are commonly used for applications such as foley, sound effects, and general audio retrieval.

    The paper addresses the gap in current research regarding the real-world needs and behaviors of users when designing text-based audio retrieval systems. By analyzing search queries collected from two sources — a custom survey and Freesound query logs — the study provides insights into user behavior in sound search contexts. Our findings reveal that users tend to formulate longer and more detailed queries when not constrained by existing systems, and that both survey and Freesound queries are predominantly keyword-based.

    This dataset contains the raw data collected from the survey and annotations of Freesound query logs.

    Files in This Dataset

    The dataset includes the following files:

    1. participants.csv
      Contains data from the survey participants. Columns:

      • id: A unique identifier for each participant.
      • fluency: Self-reported English language proficiency.
      • experience: Whether the participant has used online sound libraries before.
      • passed_instructions: Boolean value indicating whether the participant advanced past the instructions page in the survey.
    2. annotations.csv
      Contains annotations of the survey responses, detailing the participants' interaction with the sound search tasks. Columns:

      • id: A unique identifier for each annotation.
      • participant_id: Links to the participant’s ID in participants.csv.
      • stimulus_id: Identifier for the stimulus presented to the participant (audio, image, or text description).
      • stimulus_type: The type of stimulus (audio, image, text).
      • audio_result_id: Identifier for the hypothetical audio result presented during the search task.
      • query1: Initial search query submitted based on the stimulus.
      • query2: Refined search query after seeing the hypothetical search result.
      • aspects1: Aspects considered important when formulating the initial query.
      • aspects2: Aspects considered important when refining the query.
      • result_relevance: Participant's rating of the hypothetical search result's relevance.
      • time: Time taken to complete the search task.
    3. freesound_queries_annotated.csv
      Contains annotated Freesound search queries. Columns:

      • query: Text of the search query submitted to Freesound.
      • count: The number of times the specific query was submitted.
      • topic: Annotated topic of the query, based on an ontology derived from AudioSet, with an additional category, Other, which includes non-English queries and NSFW-related content.
    4. survey_stimuli_data.zip
      This ZIP file contains three CSV files corresponding to the three stimulus types used in the survey:

      • Audio stimuli: Categorized sound recordings presented to participants.
      • Image stimuli: Annotated images that prompted sound-related queries.
      • Text stimuli: Summarized descriptions of sounds provided to participants.

    More details on the stimuli and the survey methodology can be found in the accompanying paper.

    Citation

    If you use this dataset in your research, please cite the corresponding paper:

    B. Weck and F. Font, ‘The Language of Sound Search: Examining User Queries in Audio Search Engines’, in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024), Tokyo, Japan, Oct. 2024, pp. 181–185.
    @inproceedings{Weck2024,
      author = "Weck, Benno and Font, Frederic",
      title = "The Language of Sound Search: Examining User Queries in Audio Search Engines",
      booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)",
      address = "Tokyo, Japan",
      month = "October",
      year = "2024",
      pages = "181--185"
    }
  14. R

    Vogue_pk Dataset

    • universe.roboflow.com
    zip
    Updated May 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DEYAN (2023). Vogue_pk Dataset [Dataset]. https://universe.roboflow.com/deyan/vogue_pk
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 2, 2023
    Dataset authored and provided by
    DEYAN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    PHOTOBY DESIGNER MODEL ISSUEDATE Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Fashion Magazine Library Management: Operators of a large fashion magazine library can use the VOGUE_PK model to catalog their extensive collection. It can help to classify different editions by issue date, identify styles from specific stylists or designers, and even recognize featured models. This would simplify the process of finding specific issues or fashion styles.

    2. Style Tracking and Analysis: Fashion researchers, analysts, and enthusiasts could use this model to track and analyze the evolution of styles by a particular designer or stylist over time. By identifying the designer or stylist in multiple issues, users can study trends, predict future fashion movements, or create comprehensive style portfolios.

    3. Education and Training: Fashion design students or professionals could use this model as a learning tool to study and analyze the distinct characteristics of various famous designers and stylists' work in different issue dates.

    4. Image-Based Fashion Search Engines: The "VOGUE_PK" model can be instrumental in constructing a powerful image-based search engine. Users could upload an image and receive similar styles, designers, models, and the specific stylist involved in those similar styles.

    5. Content Creation: Fashion content creators, such as bloggers and journalists, can use the model to easily identify the key details about images they're using in articles, posts, or other content. The model can help to ensure that designer, model, stylist, and issue date are correctly attributed.

  15. Z

    Data set of the article: Ranking by relevance and citation counts, a...

    • nde-dev.biothings.io
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristòfol Rovira (2020). Data set of the article: Ranking by relevance and citation counts, a comparative study: Google Scholar, Microsoft Academic, WoS and Scopus [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_3381150
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Carlos Lopezosa
    Frederic Guerrero-Solé
    Cristòfol Rovira
    Lluís Codina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data of investigation published in the article "Ranking by relevance and citation counts, a comparative study: Google Scholar, Microsoft Academic, WoS and Scopus".

    Abstract of the article:

    Search engine optimization (SEO) constitutes the set of methods designed to increase the visibility of, and the number of visits to, a web page by means of its ranking on the search engine results pages. Recently, SEO has also been applied to academic databases and search engines, in a trend that is in constant growth. This new approach, known as academic SEO (ASEO), has generated a field of study with considerable future growth potential due to the impact of open science. The study reported here forms part of this new field of analysis. The ranking of results is a key aspect in any information system since it determines the way in which these results are presented to the user. The aim of this study is to analyse and compare the relevance ranking algorithms employed by various academic platforms to identify the importance of citations received in their algorithms. Specifically, we analyse two search engines and two bibliographic databases: Google Scholar and Microsoft Academic, on the one hand, and Web of Science and Scopus, on the other. A reverse engineering methodology is employed based on the statistical analysis of Spearman’s correlation coefficients. The results indicate that the ranking algorithms used by Google Scholar and Microsoft are the two that are most heavily influenced by citations received. Indeed, citation counts are clearly the main SEO factor in these academic search engines. An unexpected finding is that, at certain points in time, WoS used citations received as a key ranking factor, despite the fact that WoS support documents claim this factor does not intervene.

  16. n

    Railroad Bridges - Dataset - CKAN

    • nationaldataplatform.org
    Updated Feb 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Railroad Bridges - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/railroad-bridges
    Explore at:
    Dataset updated
    Feb 28, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Bridges-Rail in the United States According to The National Bridge Inspection Standards published in the Code of Federal Regulations (23 CFR 650.3), a bridge is: A structure including supports erected over a depression or an obstruction, such as water, highway, or railway, and having a track or passageway for carrying traffic or other moving loads. Each bridge was captured as a point which was placed in the center of the "main span" (highest and longest span). For bridges that cross navigable waterways, this was typically the part of the bridge over the navigation channel. If no "main span" was discernable using the imagery sources available, or if multiple non contiguous main spans were discernable, the point was placed in the center of the overall structure. Bridges that are sourced from the National Bridge Inventory (NBI) that cross state boundaries are an exception. Bridges that cross state boundaries are represented in the NBI by two records. The points for the two records have been located so as to be within the state indicated by the NBI's [STATE_CODE] attribute. In some cases, following these rules did not place the point at the location at which the bridge crosses what the user may judge as the most important feature intersected. For example, a given bridge may be many miles long, crossing nothing more than low lying ground for most of its length but crossing a major interstate at its far end. Due to the fact that bridges are often high narrow structures crossing depressions that may or may not be too narrow to be represented in the DEM used to orthorectify a given source of imagery, alignment with ortho imagery is highly variable. In particular, apparent bridge location in ortho imagery is highly dependent on collection angle. During verification, TechniGraphics used imagery from the following sources: NGA HSIP 133 City, State or Local; NAIP; DOQQ imagery. In cases where "bridge sway" or "tall structure lean" was evident, TGS attempted to compensate for these factors when capturing the bridge location. For instances in which the bridge was not visible in imagery, it was captured using topographic maps at the intersection of the water and rail line. TGS processed 784 entities previously with the HSIP Bridges-Roads (STRAHNET Option - HSIP 133 Cities and Gulf Coast). These entities were added into this dataset after processing. No entities were included in this dataset for American Samoa, Guam, Hawaii, the Commonwealth of the Northern Mariana Islands, or the Virgin Islands because there are no main line railways in these areas. At the request of NGA, text fields in this dataset have been set to all upper case to facilitate consistent database engine search results. At the request of NGA, leading and trailing spaces were trimmed from all text fields. At the request of NGA, all diacritics (e.g., the German umlaut or the Spanish tilde) have been replaced with their closest equivalent English character to facilitate use with database systems that may not support diacritics. The currentness of this dataset is given by the publication date which is 09/02/2009. A more precise measure of currentness cannot be provided since this is dependent on the NBI and the source of imagery used during processing.

  17. Share of Yahoo in mobile search market India 2019-2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of Yahoo in mobile search market India 2019-2024 [Dataset]. https://www.statista.com/statistics/938848/india-yahoo-share-in-mobile-search-market/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2019 - Feb 2024
    Area covered
    India
    Description

    Yahoo's share in the mobile search engine market across India was about 0.03 percent in February 2024. This was a fall in market share compared to its standing of 0.24 percent in September 2018. The immense popularity and database of Google has left little to gain for other search engine operators in India.

  18. Google All TIme Stock Data(Latest)

    • kaggle.com
    zip
    Updated Apr 3, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaurya Srivastava (2026). Google All TIme Stock Data(Latest) [Dataset]. https://www.kaggle.com/datasets/shauryasrivastava01/google-all-time-stock-datalatest
    Explore at:
    zip(188253 bytes)Available download formats
    Dataset updated
    Apr 3, 2026
    Authors
    Shaurya Srivastava
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    ⭐If this dataset helps you, consider giving it an upvote !

    🔍 Context

    Google LLC, a subsidiary of Alphabet Inc., is one of the most influential technology companies in the world. Founded in 1998 and headquartered in Mountain View, California, Google has transformed how people access information and interact with the digital world.

    From a simple search engine to a global tech powerhouse, Google now operates across multiple domains:

    • 🌐 Google Services — Search, YouTube, Ads, Android, Chrome
    • ☁️ Google Cloud — Cloud computing & enterprise solutions
    • 🤖 AI & Innovation — Machine learning, DeepMind, automation
    • 🚀 Other Bets — Health tech, and more

    💡 With a market capitalization often in the trillions of USD, Google remains a dominant force in the global digital economy.

    📁 Dataset Overview

    • 📅 Time Period: Full historical data available
    • 📊 Unit of Analysis: Alphabet Inc. (Google) stock prices
    • 💲 Currency: USD

    🧾 Variables Description

    VariableDescription
    dateTrading date
    openOpening price at market start
    highHighest price during the day
    lowLowest price during the day
    closeClosing price at market end
    adj_closeAdjusted closing price (splits/dividends)
    volumeTotal shares traded

    🚀 What You Can Do With This Dataset

    🎯 Use Cases

    • 📈 Trend & return analysis
    • ⚠️ Risk & volatility insights
    • 🔮 Forecasting & prediction
    • 💻 Algorithmic trading
    • 💼 Portfolio optimization

    Perfect playground for data enthusiasts, ML engineers, and future quants — build, experiment, and innovate!

    🚀 Why This Dataset?

    ✔ Clean and structured financial data
    ✔ Ideal for EDA, visualization, and ML
    ✔ Real-world dataset used in finance & research
    ✔ Great for projects, competitions, and learning

    📌 Whether you're just starting with data analysis or building advanced trading models — this dataset gives you a solid playground to experiment, learn, and create something impactful.

  19. LEUKOS' dataset: A selection of sixteenth and seventeenth century Stambøger...

    • zenodo.org
    • data.europa.eu
    bin, txt
    Updated Aug 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Taglialatela; Sara Taglialatela (2025). LEUKOS' dataset: A selection of sixteenth and seventeenth century Stambøger from the Royal Danish Library (Copenhagen) [Dataset]. http://doi.org/10.5281/zenodo.13623651
    Explore at:
    bin, txtAvailable download formats
    Dataset updated
    Aug 31, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sara Taglialatela; Sara Taglialatela
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 8, 2024
    Area covered
    Denmark, Copenhagen
    Description

    The data were generated to document LEUKOS’ research for objective 1, point 3 (O1.3) of the research project. LEUKOS’ research question (RQ) and brief description of O1: LEUKOS investigates whether and how, during his stay in Wittenberg (1586-1588), Bruno’s notions of the soul and of language were influenced by the new Protestant understandings of philosophy as evidenced by their discussions and suggested reforms of the liberal arts at the philosophical faculty and in Melanchthon’s works, and drew conclusions regarding the relation between Giordano Bruno and the Reformation. To reconstruct the social, intellectual, and political context in Wittenberg during Bruno’s stay (1586-1588) and pave the way for the interpretation of Bruno’s praise of the Reformers as tolerant (O1), LEUKOS reconstructs the environment that Bruno joined in Wittenberg at the time when Aristotle was reintroduced in the curricula of Luther’s university by focusing (O1.1) on the debate concerning the soul in Wittenberg (Gnesio-Lutherans and Philippists) in its philosophical and theological arguments, but also by referring to the political implications of this debate in the view of tolerance; (O2.2) on the role of female theologians and intellectuals in the reformed communities, and in the intellectual world, as the Reformation gave them access to direct participation in cultural and entrepreneurial professions (e.g. as printers); (O1.3) on Danish intellectuals in Bruno’s network as Wittenberg was a centre of diffusion of the Reformation also for the Scandinavian countries. Data generated: pictures of a selection of sixteenth and seventeenth century Stambøger (Alba Amicorum) belonging to the Royal Danish Library collections documenting the transnational intellectual network between Germany and Copenhagen in the second half of the sixteenth century and in the first half of the seventeenth century. The data might be useful to researchers working (1) on the history of the book, (2) on the Royal Danish Library sixteenth and seventeenth century book collections, (3) on Stambøger (Alba Amicorum), (4) on intellectual networks between Germany and Denmark in the sixteenth and seventeenth century.

    Instrument- or software-specific information needed to interpret the data: It is possible to open a HEIC file on Windows 10 or 11 by: 1. Using the Microsoft Photos App: If your Windows 10 or 11 is up to date, the Microsoft Photos app should already support the HEIC file format. Simply right-click on the HEIC file, select "Open With," and choose "Photos" from the list of apps. The Microsoft Photos app will open the HEIC file and display its contents; 2. Converting HEIC files to JPEG or PNG: If the Microsoft Photos app doesn't support HEIC files on your system, you can convert HEIC files to JPEG or PNG format using an online converter or dedicated software. Search for "HEIC to JPEG converter" or "HEIC to PNG converter" in your preferred search engine to find various online converter tools or downloadable software. Upload your HEIC file to the converter tool or software, select the desired output format (JPEG or PNG), and convert the file. Once converted, you can easily view the resulting JPEG or PNG file using any image viewer or photo app on your Windows 10 or 11 PC; 3. Installing a Third-party HEIC Viewer: If you frequently work with HEIC files, you can also install a dedicated HEIC viewer app from the Microsoft Store or other reputable sources. Search for "HEIC viewer" in the Microsoft Store or other platforms, and look for apps that specifically support viewing HEIC files. Install the chosen HEIC viewer app, and use it to open and view HEIC files on your Windows 10 or 11 PC.

  20. f

    Data from: Identification of Poly(ethylene glycol) and Poly(ethylene...

    • figshare.com
    • acs.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiva Ahmadi; Dominic Winter (2023). Identification of Poly(ethylene glycol) and Poly(ethylene glycol)-Based Detergents Using Peptide Search Engines [Dataset]. http://doi.org/10.1021/acs.analchem.8b00365.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Shiva Ahmadi; Dominic Winter
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Poly(ethylene glycol) (PEG) is one of the most common polymer contaminations in mass spectrometry (MS) samples. At present, the detection of PEG and other polymers relies largely on manual inspection of raw data, which is laborious and frequently difficult due to sample complexity and retention characteristics of polymer species in reversed-phase chromatography. We developed a new strategy for the automated identification of PEG molecules from tandem mass spectrometry (MS/MS) data using protein identification algorithms in combination with a database containing “PEG–proteins”. Through definition of variable modifications, we extend the approach for the identification of commonly used PEG-based detergents. We exemplify the identification of different types of polymers by static nanoelectrospray tandem mass spectrometry (nanoESI-MS/MS) analysis of pure detergent solutions and data analysis using Mascot. Analysis of liquid chromatography–tandem mass spectrometry (LC–MS/MS) runs of a PEG-contaminated sample by Mascot identified 806 PEG spectra originating from four PEG species using a defined set of modifications covering PEG and common PEG-based detergents. Further characterization of the sample for unidentified PEG species using error-tolerant and mass-tolerant searches resulted in identification of 3409 and 3187 PEG-related MS/MS spectra, respectively. We further demonstrate the applicability of the strategy for Protein Pilot and MaxQuant.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). Relato Business Graph Database [Dataset]. https://www.kaggle.com/datasets/thedevastator/relato-business-network-graph-373663-domain-conn/code
Organization logo

Relato Business Graph Database

Visualizing Company Relationships & Market Trends

Explore at:
zip(17698362 bytes)Available download formats
Dataset updated
Jan 15, 2023
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Relato Business Graph Database

Visualizing Company Relationships & Market Trends

By Russell Jurney [source]

About this dataset

This dataset contains 373,663 links between businesses pulled from the web, providing an insightful reflection of intercompany relationships. It is sourced from a graph database hosted by the startup Relato, the origins of which stem from a Turk system in which users would log partnerships, example customers and more onto an autocomplete form over a database.

These edges are defined primarily using domain names as unique IDs - allowing for information to be collated without succumbing to the extremely complex problem of entity resolution of companies. This data was used to build lead generation systems and market visualization systems – enabling powerful insights into patterns emerging from business relations.

This largely unstructured dataset gives rise to many questions: How does the business graph operate? How do companies relate to one another? What other problems can this dataset be used for and how else can it be extended? By exploring such questions with this set we can enrich our understanding of corporate connections and discover potential opportunities for further research and marketing efforts.

Get involved with this project: Contribute new edges; add metadata about companies; or analyze this substantial source material alone or in conjunction with various other public datasets!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is a great resource for researchers and practitioners interested in understanding the business relationships between companies. The dataset contains 373,663 links between companies, including information about the type of link, time it was updated, and domains of the two companies. This can be used to identify new potential business partners or competitors by understanding connections within industries or networks of customers.

To get started with this dataset, explore the various columns provided in this data set such as update_time (the timestamp when an entry was last update), domain hostname where a relationship was found for one company during mining), username (user who last updated entry), home_name (name of the home company mentioned in link information) , link _name ( name of linked company mentioned in link information) , type (type o relationship between home and linked company like partnership etc.), home_domain(domain examined where relationship was found for onee compmany during mining) ,link domain( domain deriving from second organisation). You can also browse through any online visualization tools such as Gephi to understand connection patterns from this dataset . Say you want to explore connections within a certain industry-you could parse out entries for that industry by filtering columns. For example if you wanted to understand how tech companies are connected - use that column values like ' Technology ', 'Telecoms'andrelated words in all columns to parsethrough data entries efficiently . Once you have identified those entries then leverage details about their partners/customers/potential investors from linkage points across different firms .
Using graph analysis on your subset looking at most prefered connection points amongst firms might enable identification into potential areas or marketsfor pursuing collaborations within your sector or even spotting upcoming trends & risk around interdependencies among sectors over time etc. Further breakdown consistency in time updates might allow you track shifting dynamicsof relationshpis form 1 quarter to another etc.. Finally once completed never forget double check whether these observed patterns have enough evidence since many times realtion ship data gathered may lack accuracy due its collection manually over web invormation which sometimes is not timely kept up too date hence always make sure to be aware of errors incurred common;y due extractions form internet sources especially whenits manually done

Hopefully these tips would help provide starting point for exploring such graphs! Have fun data hunting!

Research Ideas

  • Building an AI/Machine learning powered sales lead manager. By analyzing the graph, this tool could provide insights about a company's potential customers, partners, competitors and suppliers.
  • Creating a visualization tool for marketers to understand their markets better by visualizing the connections between companies in different industries and sectors.
  • Creating an interactive web-based search engine that allows users to quickly ...
Search
Clear search
Close search
Google apps
Main menu