Offers exclusive access to patent application status information for unpublished patent applications only to the applicant/inventor or his/her representative(s). Private PAIR includes bibliographic, patent term adjustments, continuity data, foreign priority, and address & attorney/agent information from the Patent Application Locating and Monitoring (PALM) System; PDF images of documents (including correspondence) and a transaction history from the Content Management System (CMS) (formerly the Image File Wrapper (IFW) System); and fee information from the Fee Processing Next Generation (FPNG) System. Search is by application number (with or without the two-digit series code), control number, or Patent Cooperation Treaty (PCT) number. Private PAIR requires users to establish a USPTO.gov account and customer number, and establish a password. For more information about establishing a USPTO.gov account and customer number: https://www.uspto.gov/patents-application-process/applying-online/getting-started-new-users Unavailable during database backups (Saturday, Tuesday, and Thursday from 04:30 - 04:45 AM U.S. Eastern Time and Sunday 00:01 - 04:00 AM U.S. Eastern Time. Updated daily. https://ppair-my.uspto.gov/pair/PrivatePair
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This submission includes all pretrained models, test data and prediction files for the arXiv paper "Simple Applications of BERT for Ad Hoc Document Retrieval". Please follow the instructions at the Birch repo to reproduce the results.
CAIRSS is a bibliographic database of older literature (prior to 1993) of music research literature in music education, music psychology, music therapy, and music medicine. Citations have been taken from 1,354 different journal titles; 18 of which are primary journals, meaning that every article ever to appear is included. The primary journals are: * Arts in Psychotherapy * Bulletin of the Council for Research in Music Education * Bulletin of the National Association for Music Therapy * Contributions to Music Education * Hospital Music Newsletter * International Journal of Arts Medicine * Journal of the Association for Music and Imagery * Journal of Music Teacher Education * Journal of Music Therapy * Journal of Research in Music Education * Medical Problems of Performing Artists * Music Perception * Music Therapy * Music Therapy Perspectives * Psychology of Music * Psychomusicology * The Quarterly * Applications of Research to Music Education
Cross-language Information Retrieval (CLIR) has been studied at TREC and subsequent evaluation forums for more than twenty years, but recent advances in the application of deep learning to information retrieval (IR) warrant a new, large-scale effort that will enable exploration of classical and modern IR techniques for this task.
Information about the turn around time for fulfilling requests from the public for copies of planning and construction documents such as past permits, plans and construction drawings.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract This article presents a model for information retrieval in linked open data using methods and complex network operations for ranking and selecting information, and concept maps for presenting the retrieved information to the user. The model shows the relationships between query terms that represent an informational need and presents them as concept maps. The underlying hypothesis is that the user’s relationship to the retrieved information occurs in the light of Brookes’ fundamental equation of information science. The cognitive structure of the cognoscente is a complex network that is modulated by the retrieved information which, in turn, is derived from a complex network. The final complex network is mapped into a resulting concept map enhanced by heuristics, such as the application of controlled vocabulary. The first study conducted, with qualitative characteristics and using an exploratory approach, was an information retrieval pilot test. It allowed the assessment of the algorithms used in the ranking and selection of the intermediate information networks and provided the framework for the implementation of a prototype. The prototype used a knowledge base of linked open data, derived from DBpedia, on which complex network analysis were carried out. The validation of the model presented relevant recall and precision when applied to a group of 17 users. The results are promising for the use of complex network operations and concept maps for information retrieval, especially linked data. Further research should observe the demand for more interactive actions and conduct experiments in other knowledge bases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT The efficient retrieval of information relevant for organizational business still constitutes a challenge for information managers, especially if the goal is stimulate and facilitate wide access to information generated internally and externally to the organization, aiming to carry out strategic actions. From this perspective, information management has become a crucial activity for organizations, in other words, information scanning and mining that facilitate strategic decisions and actions, in order to provide organizational competitive advantages. The conditions of access and seeks for the appropriation and use of information in organizational contexts has become a complex activity due to the subjectivity of information, this characteristic requires that the manager of information to develop a keen insight regarding the informational world, so that meet the informational needs and demands effectively.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The global Document Management and Retrieval System (DMRS) market is experiencing robust growth, driven by the increasing need for efficient information management across diverse sectors. The rising volume of digital documents, coupled with stringent regulatory compliance requirements and the growing adoption of cloud-based solutions, are key factors fueling market expansion. Academic institutions, corporations, and the public sector are increasingly relying on DMRS to streamline workflows, enhance collaboration, and ensure data security. The market is segmented by application (Academic, Corporate, Public Sector) and type (Cloud-based, On-premises), with cloud-based solutions gaining significant traction due to their scalability, accessibility, and cost-effectiveness. Key players like Clarivate, Elsevier, and Digital Science are driving innovation through continuous product development and strategic partnerships. While the on-premises segment retains a presence, the shift towards cloud-based solutions is anticipated to continue, driven by the benefits of remote access and reduced infrastructure costs. Regional variations exist, with North America and Europe currently holding significant market shares, although Asia-Pacific is projected to witness substantial growth in the coming years, fueled by increasing digitalization and technological advancements. The competitive landscape is characterized by both established players and emerging companies offering specialized solutions. This leads to a dynamic market with a focus on continuous improvement and innovation. The forecast period (2025-2033) anticipates sustained growth, propelled by technological advancements like AI-powered search and retrieval capabilities, improved integration with other business applications, and the increasing demand for robust security features. The market is expected to consolidate somewhat, with larger players potentially acquiring smaller firms to expand their product portfolios and market reach. Despite the strong growth outlook, challenges remain, including data security concerns, integration complexities, and the need for user-friendly interfaces. Addressing these concerns through continuous innovation and user-centric design will be crucial for sustained market success. The market is expected to witness a gradual shift towards more sophisticated and integrated DMRS solutions, catering to the evolving needs of diverse user groups.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2
Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.
Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)
The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.
https://exactitudeconsultancy.com/privacy-policyhttps://exactitudeconsultancy.com/privacy-policy
The search engine market is projected to be valued at $150 billion in 2024, driven by factors such as increasing consumer awareness and the rising prevalence of industry-specific trends. The market is expected to grow at a CAGR of 5.5%, reaching approximately $250 billion by 2034.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset provides a comprehensive description of the dataset used to construct the South African Informal Business Sector (SAIBUS) ontology throughout the study. The document includes information on the sources of the data used, the requirement descriptions. It also provides details on the use cases and competency questions that were used to identify the requirements for the ontology, as well as the mapping of these questions to the six dimensions that each sub-ontology should satisfy. The dataset document is a valuable resource for researchers and developers who wish to use the SAIBUS ontology for intelligent search, service recommendation, semantic processing of queries, information retrieval, and intelligent reasoning by apps.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set includes the raw data for variables used in the survey of primary care providers reported in "Barriers and Facilitators to Use of a Clinical Evidence Technology in the Management of Skin Problems in Primary Care: Insights from Mixed Methods" Jour Med Lib Assoc July 2020 . Data is comprised of 39 variables (columns) from survey questions, and 21 cases (rows) with responses from participants. The data dictionary file includes the variable name, type of response, and the survey questions in a comma separated file.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The original release of the Patent Examination Research Dataset (PatEx) contains detailed information on 9.2 million publicly viewable patent applications filed with the USPTO through December 2014. Currently, two updates of the dataset are available as well, the most recent posted in November 2017 (and referred to as the 2016 release). This latest release covers all activity through 2016, but also includes activity through late June of 2017. It is called the 2016 release because 2016 is the latest year for which PatEx provides information on all activities. There are several data files, each of which coincides with a tab on USPTO’s Public PAIR web portal. The data files include information on each application’s characteristics, prosecution history, continuation history, claims of foreign priority, patent term adjustment history, publication history, and correspondence address information.
USPTO Patent Examination Research Data (PatEx) contains detailed information on millions of publicly viewable patent applications filed with the USPTO. The data are sourced from the Public Patent Application Information Retrieval system (Public PAIR).
“USPTO Patent Examination Research Dataset” by the USPTO, for public use. Graham, S. Marco, A., and Miller, A. (2015). “The USPTO Patent Examination Research Dataset: A Window on the Process of Patent Examination.”
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_pair
Banner photo by Samuel Zeller on Unsplash
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The peer-reviewed publication for this dataset has been presented in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and can be accessed here: https://arxiv.org/abs/2205.02596. Please cite this when using the dataset.
This dataset contains a heterogeneous set of True and False COVID claims and online sources of information for each claim.
The claims have been obtained from online fact-checking sources, existing datasets and research challenges. It combines different data sources with different foci, thus enabling a comprehensive approach that combines different media (Twitter, Facebook, general websites, academia), information domains (health, scholar, media), information types (news, claims) and applications (information retrieval, veracity evaluation).
The processing of the claims included an extensive de-duplication process eliminating repeated or very similar claims. The dataset is presented in a LARGE and a SMALL version, accounting for different degrees of similarity between the remaining claims (excluding respectively claims with a 90% and 99% probability of being similar, as obtained through the MonoT5 model). The similarity of claims was analysed using BM25 (Robertson et al., 1995; Crestani et al., 1998; Robertson and Zaragoza, 2009) with MonoT5 re-ranking (Nogueira et al., 2020), and BERTScore (Zhang et al., 2019).
The processing of the content also involved removing claims making only a direct reference to existing content in other media (audio, video, photos); automatically obtained content not representing claims; and entries with claims or fact-checking sources in languages other than English.
The claims were analysed to identify types of claims that may be of particular interest, either for inclusion or exclusion depending on the type of analysis. The following types were identified: (1) Multimodal; (2) Social media references; (3) Claims including questions; (4) Claims including numerical content; (5) Named entities, including: PERSON − People, including fictional; ORGANIZATION − Companies, agencies, institutions, etc.; GPE − Countries, cities, states; FACILITY − Buildings, highways, etc. These entities have been detected using a RoBERTa base English model (Liu et al., 2019) trained on the OntoNotes Release 5.0 dataset (Weischedel et al., 2013) using Spacy.
The original labels for the claims have been reviewed and homogenised from the different criteria used by each original fact-checker into the final True and False labels.
The data sources used are:
The CoronaVirusFacts/DatosCoronaVirus Alliance Database. https://www.poynter.org/ifcn-covid-19-misinformation/
CoAID dataset (Cui and Lee, 2020) https://github.com/cuilimeng/CoAID
MM-COVID (Li et al., 2020) https://github.com/bigheiniu/MM-COVID
CovidLies (Hossain et al., 2020) https://github.com/ucinlp/covid19-data
TREC Health Misinformation track https://trec-health-misinfo.github.io/
TREC COVID challenge (Voorhees et al., 2021; Roberts et al., 2020) https://ir.nist.gov/covidSubmit/data.html
The LARGE dataset contains 5,143 claims (1,810 False and 3,333 True), and the SMALL version 1,709 claims (477 False and 1,232 True).
The entries in the dataset contain the following information:
Claim. Text of the claim.
Claim label. The labels are: False, and True.
Claim source. The sources include mostly fact-checking websites, health information websites, health clinics, public institutions sites, and peer-reviewed scientific journals.
Original information source. Information about which general information source was used to obtain the claim.
Claim type. The different types, previously explained, are: Multimodal, Social Media, Questions, Numerical, and Named Entities.
Funding. This work was supported by the UK Engineering and Physical Sciences Research Council (grant no. EP/V048597/1, EP/T017112/1). ML and YH are supported by Turing AI Fellowships funded by the UK Research and Innovation (grant no. EP/V030302/1, EP/V020579/1).
References
Arana-Catania M., Kochkina E., Zubiaga A., Liakata M., Procter R., He Y.. Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims. NAACL 2022 https://arxiv.org/abs/2205.02596
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at trec-3. Nist Special Publication Sp,109:109.
Fabio Crestani, Mounia Lalmas, Cornelis J Van Rijsbergen, and Iain Campbell. 1998. “is this document relevant?. . . probably” a survey of probabilistic models in information retrieval. ACM Computing Surveys (CSUR), 30(4):528–552.
Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pre-trained sequence-to-sequence model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 708–718.
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23.
Limeng Cui and Dongwon Lee. 2020. Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885.
Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. Mm-covid: A multilingual and multimodal data repository for combating covid-19 disinformation.
Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte, Yoshitomo Matsubara, Sean Young, and Sameer Singh. 2020. COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. Association for Computational Linguistics.
Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2021. Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The image queries were used in the following studies:* Y. Bitirim, S. Bitirim, D. Ç. Ertuğrul and Ö. Toygar, “An Evaluation of Reverse Image Search Performance of Google”, 2020 IEEE 44th Annual Computer Software and Applications Conference (COMPSAC), pp. 1368-1372, IEEE, Madrid, Spain, July 2020. (DOI: 10.1109/COMPSAC48688.2020.00-65)** Y. Bitirim, “Retrieval Effectiveness of Google on Reverse Image Search”, Journal of Imaging Science and Technology, Vol. 66, No. 1, pp. 010505-1-010505-6, January 2022. (DOI: 10.2352/J.ImagingSci.Technol.2022.66.1.010505)
https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-licensehttps://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license
Abstract A set of easy-to-use FORTRAN routines for building and accessing data structures of the type commonly encountered in scientific applications is introduced. Fetch and insert times go as ≻ [log (n)], where n is the number of elements in the list. The routines implement AVL or height-balanced binary tree logic. Each tree is a linear integer array. The first ten elements of a tree array specify its structure and the remaining elements are dedicated to node information. Each node includes key and ... Title of program: BBTREE Catalogue Id: ABJR_v1_0 Nature of problem Typical scientific programming applications require numerous calls to one or more subroutines. The intermediate results generated by these calls are usually not saved; if the same information is required at a later stage it is simply recalculated. While wasteful of cpu power, this modus operandi is attractive because it spares the user the time and effort associated with the development of complicated data storage and retrieval algorithms. However, if the number of redundant calls to a particula ... Versions of this program held in the CPC repository in Mendeley Data ABJR_v1_0; BBTREE; 10.1016/0010-4655(89)90076-3 This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a copy of the RELISH dataset used in the paper "Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity" by Sheshera Mysore, Arman Cohan, Tom Hope. The RELISH dataset was first introduced in Brown et al. 2019. See further details of the paper, how this dataset was compiled, and how it was used: https://github.com/allenai/aspireThe contents of the dataset are as follows: abstracts-relish.jsonl: jsonl file containing the paper-id, abstracts, and titles for the queries and candidates which are part of the dataset.
relish-queries-release.csv: Metadata associated with every query.test-pid2anns-relish.json: JSON file with the query paper-id, candidate paper-ids for every query paper in the dataset. Use these files in conjunction with abstracts-relish.jsonl to generate files for use in model evaluation. relish-evaluation_splits.json: Paper-ids for the splits to use in reporting evaluation numbers. aspire/src/evaluation/ranking_eval.py included in the github repo accompanying this dataset implements the evaluation protocol and computes evaluation metrics. Please see the paper for descriptions of the experimental protocol we recommend to report evaluation metrics.
Text embedding Datasets
The text embedding datasets consist of several (query, passage) paired datasets aiming for text-embedding model finetuning. These datasets are ideal for developing and testing algorithms in the fields of natural language processing, information retrieval, and similar applications.
Dataset Details
Each dataset in this collection is structured to facilitate the training and evaluation of text-embedding models. The datasets are diverse, covering… See the full description on the dataset page: https://huggingface.co/datasets/ProfessorBob/text-embedding-dataset.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In this dataset the data regarding the research on text vs voice vs combined conversational agent voting advice applications can be found. Conversational Agent Voting Advice Applications (CAVAAs) have been proven to be valuable information retrieval systems for citizens who aim to obtain a voting advice based on their answers to political attitude statements but desire additional on-demand information about the political issues first by using a chatbot functionality. Research on CAVAAs is relatively young and in previous studies only the effects of textual CAVAAs has been examined. In light of the positive effects of these tools found in earlier studies, we compared different modalities in which information can be requested to further optimize the design of these information retrieval systems. In an experimental study (N = 60), three CAVAA modalities (text, voice, or a combination of text and voice) were compared on tool evaluation measures (ease of use, usefulness, and enjoyment), political measures (perceived and factual political knowledge), and usage measures (the amount of information retrieved from the chatbot and miscommunication). Results show that the textual and combined CAVAA outperformed the voice CAVAA on several aspects: the voice CAVAA received lower ease of use and usefulness scores, respondents requested less additional information and they experienced more miscommunication when interacting with the chatbot. Furthermore, given the fact that the predefined buttons were predominantly used and stimulated users to request also more and different types of information, it can be concluded that CAVAAs should make information accessible in an easy way to to play into CAVAA users’ processing mode of low elaboration.
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Yearly citation counts for the publication titled "Why are online catalogs hard to use? Lessons learned from information-retrieval studies".
Offers exclusive access to patent application status information for unpublished patent applications only to the applicant/inventor or his/her representative(s). Private PAIR includes bibliographic, patent term adjustments, continuity data, foreign priority, and address & attorney/agent information from the Patent Application Locating and Monitoring (PALM) System; PDF images of documents (including correspondence) and a transaction history from the Content Management System (CMS) (formerly the Image File Wrapper (IFW) System); and fee information from the Fee Processing Next Generation (FPNG) System. Search is by application number (with or without the two-digit series code), control number, or Patent Cooperation Treaty (PCT) number. Private PAIR requires users to establish a USPTO.gov account and customer number, and establish a password. For more information about establishing a USPTO.gov account and customer number: https://www.uspto.gov/patents-application-process/applying-online/getting-started-new-users Unavailable during database backups (Saturday, Tuesday, and Thursday from 04:30 - 04:45 AM U.S. Eastern Time and Sunday 00:01 - 04:00 AM U.S. Eastern Time. Updated daily. https://ppair-my.uspto.gov/pair/PrivatePair