100+ datasets found

d
PRIVATE Patent Application Information Retrieval (PAIR)
catalog.data.gov
Updated Jul 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patents (2022). PRIVATE Patent Application Information Retrieval (PAIR) [Dataset]. https://catalog.data.gov/dataset/private-patent-application-information-retrieval-pair
Explore at:
Dataset updated
Jul 15, 2022
Dataset provided by
Patents
Description
Offers exclusive access to patent application status information for unpublished patent applications only to the applicant/inventor or his/her representative(s). Private PAIR includes bibliographic, patent term adjustments, continuity data, foreign priority, and address & attorney/agent information from the Patent Application Locating and Monitoring (PALM) System; PDF images of documents (including correspondence) and a transaction history from the Content Management System (CMS) (formerly the Image File Wrapper (IFW) System); and fee information from the Fee Processing Next Generation (FPNG) System. Search is by application number (with or without the two-digit series code), control number, or Patent Cooperation Treaty (PCT) number. Private PAIR requires users to establish a USPTO.gov account and customer number, and establish a password. For more information about establishing a USPTO.gov account and customer number: https://www.uspto.gov/patents-application-process/applying-online/getting-started-new-users Unavailable during database backups (Saturday, Tuesday, and Thursday from 04:30 - 04:45 AM U.S. Eastern Time and Sunday 00:01 - 04:00 AM U.S. Eastern Time. Updated daily. https://ppair-my.uspto.gov/pair/PrivatePair
Z
Models and Data for Simple Applications of BERT for Ad Hoc Document...
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang, Wei (2020). Models and Data for Simple Applications of BERT for Ad Hoc Document Retrieval [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3241944
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Yang, Wei
Zhang, Haotian
Lin, Jimmy
Akkalyoncu Yilmaz, Zeynep
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This submission includes all pretrained models, test data and prediction files for the arXiv paper "Simple Applications of BERT for Ad Hoc Document Retrieval". Please follow the instructions at the Birch repo to reproduce the results.
r
Computer-Assisted Information Retrieval Service System for Music
rrid.site
dknet.org
+2more
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Computer-Assisted Information Retrieval Service System for Music [Dataset]. http://identifiers.org/RRID:SCR_008177
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008177
Dataset updated
Jul 12, 2025
Description
CAIRSS is a bibliographic database of older literature (prior to 1993) of music research literature in music education, music psychology, music therapy, and music medicine. Citations have been taken from 1,354 different journal titles; 18 of which are primary journals, meaning that every article ever to appear is included. The primary journals are: * Arts in Psychotherapy * Bulletin of the Council for Research in Music Education * Bulletin of the National Association for Music Therapy * Contributions to Music Education * Hospital Music Newsletter * International Journal of Arts Medicine * Journal of the Association for Music and Imagery * Journal of Music Teacher Education * Journal of Music Therapy * Journal of Research in Music Education * Medical Problems of Performing Artists * Music Perception * Music Therapy * Music Therapy Perspectives * Psychology of Music * Psychomusicology * The Quarterly * Applications of Research to Music Education
TREC 2022 NeuCLIR Dataset
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
data.nist.gov
+1more
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). TREC 2022 NeuCLIR Dataset [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/2022-neuclir-dataset
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
Cross-language Information Retrieval (CLIR) has been studied at TREC and subsequent evaluation forums for more than twenty years, but recent advances in the application of deep learning to information retrieval (IR) warrant a new, large-scale effort that will enable exploration of classical and modern IR techniques for this task.
T
DSD - Document Retrieval Applications
citydata.mesaaz.gov
data.mesaaz.gov
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Services (2025). DSD - Document Retrieval Applications [Dataset]. https://citydata.mesaaz.gov/Development-Services/DSD-Document-Retrieval-Applications/knta-3f2w
Explore at:
kml, xlsx, csv, xml, kmz, application/geo+jsonAvailable download formats
Dataset updated
Aug 20, 2025
Dataset authored and provided by
Development Services
Description
Information about the turn around time for fulfilling requests from the public for copies of planning and construction documents such as past permits, plans and construction drawings.
f
Data from: Information retrieval in linked data: A model based on concept...
scielo.figshare.com
jpeg
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henrique Monteiro CRISTOVÃO; Jorge Henrique Cabral FERNANDES (2023). Information retrieval in linked data: A model based on concept maps and complex networks analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6502964.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6502964.v1
Dataset updated
Jun 2, 2023
Dataset provided by
SciELO journals
Authors
Henrique Monteiro CRISTOVÃO; Jorge Henrique Cabral FERNANDES
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract This article presents a model for information retrieval in linked open data using methods and complex network operations for ranking and selecting information, and concept maps for presenting the retrieved information to the user. The model shows the relationships between query terms that represent an informational need and presents them as concept maps. The underlying hypothesis is that the user’s relationship to the retrieved information occurs in the light of Brookes’ fundamental equation of information science. The cognitive structure of the cognoscente is a complex network that is modulated by the retrieved information which, in turn, is derived from a complex network. The final complex network is mapped into a resulting concept map enhanced by heuristics, such as the application of controlled vocabulary. The first study conducted, with qualitative characteristics and using an exploratory approach, was an information retrieval pilot test. It allowed the assessment of the algorithms used in the ranking and selection of the intermediate information networks and provided the framework for the implementation of a prototype. The prototype used a knowledge base of linked open data, derived from DBpedia, on which complex network analysis were carried out. The validation of the model presented relevant recall and precision when applied to a group of 17 users. The results are promising for the use of complex network operations and concept maps for information retrieval, especially linked data. Further research should observe the demand for more interactive actions and conduct experiments in other knowledge bases.
f
Data from: Process of search and retrieval of information in organizational...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thiciane Mary Carvalho Teixeira; Marta Lígia Pomim Valentim (2023). Process of search and retrieval of information in organizational environments: a theoretical reflection on the subjectivity of information [Dataset]. http://doi.org/10.6084/m9.figshare.5931103.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5931103.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
Thiciane Mary Carvalho Teixeira; Marta Lígia Pomim Valentim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The efficient retrieval of information relevant for organizational business still constitutes a challenge for information managers, especially if the goal is stimulate and facilitate wide access to information generated internally and externally to the organization, aiming to carry out strategic actions. From this perspective, information management has become a crucial activity for organizations, in other words, information scanning and mining that facilitate strategic decisions and actions, in order to provide organizational competitive advantages. The conditions of access and seeks for the appropriation and use of information in organizational contexts has become a complex activity due to the subjectivity of information, this characteristic requires that the manager of information to develop a keen insight regarding the informational world, so that meet the informational needs and demands effectively.
D
Document Management and Retrieval System Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Document Management and Retrieval System Report [Dataset]. https://www.marketreportanalytics.com/reports/document-management-and-retrieval-system-55257
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Apr 3, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Document Management and Retrieval System (DMRS) market is experiencing robust growth, driven by the increasing need for efficient information management across diverse sectors. The rising volume of digital documents, coupled with stringent regulatory compliance requirements and the growing adoption of cloud-based solutions, are key factors fueling market expansion. Academic institutions, corporations, and the public sector are increasingly relying on DMRS to streamline workflows, enhance collaboration, and ensure data security. The market is segmented by application (Academic, Corporate, Public Sector) and type (Cloud-based, On-premises), with cloud-based solutions gaining significant traction due to their scalability, accessibility, and cost-effectiveness. Key players like Clarivate, Elsevier, and Digital Science are driving innovation through continuous product development and strategic partnerships. While the on-premises segment retains a presence, the shift towards cloud-based solutions is anticipated to continue, driven by the benefits of remote access and reduced infrastructure costs. Regional variations exist, with North America and Europe currently holding significant market shares, although Asia-Pacific is projected to witness substantial growth in the coming years, fueled by increasing digitalization and technological advancements. The competitive landscape is characterized by both established players and emerging companies offering specialized solutions. This leads to a dynamic market with a focus on continuous improvement and innovation. The forecast period (2025-2033) anticipates sustained growth, propelled by technological advancements like AI-powered search and retrieval capabilities, improved integration with other business applications, and the increasing demand for robust security features. The market is expected to consolidate somewhat, with larger players potentially acquiring smaller firms to expand their product portfolios and market reach. Despite the strong growth outlook, challenges remain, including data security concerns, integration complexities, and the need for user-friendly interfaces. Addressing these concerns through continuous innovation and user-centric design will be crucial for sustained market success. The market is expected to witness a gradual shift towards more sophisticated and integrated DMRS solutions, catering to the evolving needs of diverse user groups.
m
Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...
data.mendeley.com
Updated Jul 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. http://doi.org/10.17632/xmcg82mx9k.3
Explore at:
Unique identifier
https://doi.org/10.17632/xmcg82mx9k.3
Dataset updated
Jul 25, 2022
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2

Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.
e
Global Search Engine Market Research Report By Product Type (Paid Search,...
exactitudeconsultancy.com
Updated May 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Exactitude Consultancy (2025). Global Search Engine Market Research Report By Product Type (Paid Search, Organic Search, Local Search), By Application (E-commerce, Information Retrieval, Advertising), By End User (Individual, Small Medium Enterprises, Large Enterprises), By Technology (AI-Powered Search, Voice Search, Mobile Search), By Distribution Channel (Web-based, Mobile Applications) – Forecast to 2034. [Dataset]. https://exactitudeconsultancy.com/reports/61815/global-search-engine-market
Explore at:
Dataset updated
May 2025
Dataset authored and provided by
Exactitude Consultancy
License
https://exactitudeconsultancy.com/privacy-policyhttps://exactitudeconsultancy.com/privacy-policy
Description
The search engine market is projected to be valued at $150 billion in 2024, driven by factors such as increasing consumer awareness and the rising prevalence of industry-specific trends. The market is expected to grow at a CAGR of 5.5%, reaching approximately $250 billion by 2034.
c
Development of a Domain Ontology to Support Information Retrieval on the...
esango.cput.ac.za
pdf
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Glodi Mokombati Atoba (2023). Development of a Domain Ontology to Support Information Retrieval on the South African Informal Sector Services [Dataset]. http://doi.org/10.25381/cput.22344517.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25381/cput.22344517.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Cape Peninsula University of Technology
Authors
Glodi Mokombati Atoba
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
South Africa
Description
This dataset provides a comprehensive description of the dataset used to construct the South African Informal Business Sector (SAIBUS) ontology throughout the study. The document includes information on the sources of the data used, the requirement descriptions. It also provides details on the use cases and competency questions that were used to identify the requirements for the ontology, as well as the mapping of these questions to the six dimensions that each sub-ontology should satisfy. The dataset document is a valuable resource for researchers and developers who wish to use the SAIBUS ontology for intelligent search, service recommendation, semantic processing of queries, information retrieval, and intelligent reasoning by apps.
Use of a Clinical Evidence Technology for Skin Disease in Primary Care:...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marianne Burke; Benjamin Littenberg (2023). Use of a Clinical Evidence Technology for Skin Disease in Primary Care: Clinician Survey Data [Dataset]. http://doi.org/10.6084/m9.figshare.11893875.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11893875.v3
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Marianne Burke; Benjamin Littenberg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set includes the raw data for variables used in the survey of primary care providers reported in "Barriers and Facilitators to Use of a Clinical Evidence Technology in the Management of Skin Problems in Primary Care: Insights from Mixed Methods" Jour Med Lib Assoc July 2020 . Data is comprised of 39 variables (columns) from survey questions, and 21 cases (rows) with responses from participants. The data dictionary file includes the variable name, type of response, and the survey questions in a comma separated file.
USPTO Patent Examination Research Data (PatEx)
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). USPTO Patent Examination Research Data (PatEx) [Dataset]. https://www.kaggle.com/datasets/bigquery/uspto-oce-pair
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

The original release of the Patent Examination Research Dataset (PatEx) contains detailed information on 9.2 million publicly viewable patent applications filed with the USPTO through December 2014. Currently, two updates of the dataset are available as well, the most recent posted in November 2017 (and referred to as the 2016 release). This latest release covers all activity through 2016, but also includes activity through late June of 2017. It is called the 2016 release because 2016 is the latest year for which PatEx provides information on all activities. There are several data files, each of which coincides with a tab on USPTO’s Public PAIR web portal. The data files include information on each application’s characteristics, prosecution history, continuation history, claims of foreign priority, patent term adjustment history, publication history, and correspondence address information.

Content

USPTO Patent Examination Research Data (PatEx) contains detailed information on millions of publicly viewable patent applications filed with the USPTO. The data are sourced from the Public Patent Application Information Retrieval system (Public PAIR).

Acknowledgements

“USPTO Patent Examination Research Dataset” by the USPTO, for public use. Graham, S. Marco, A., and Miller, A. (2015). “The USPTO Patent Examination Research Dataset: A Window on the Process of Patent Examination.”

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_pair

Banner photo by Samuel Zeller on Unsplash
Z
Data from: PANACEA dataset - Heterogeneous COVID-19 Claims
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jul 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zubiaga, Arkaitz (2022). PANACEA dataset - Heterogeneous COVID-19 Claims [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6493846
Explore at:
Dataset updated
Jul 15, 2022
Dataset provided by
He, Yulan
Zubiaga, Arkaitz
Procter, Rob
Kochkina, Elena
Arana-Catania, Miguel
Liakata, Maria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The peer-reviewed publication for this dataset has been presented in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and can be accessed here: https://arxiv.org/abs/2205.02596. Please cite this when using the dataset.

This dataset contains a heterogeneous set of True and False COVID claims and online sources of information for each claim.

The claims have been obtained from online fact-checking sources, existing datasets and research challenges. It combines different data sources with different foci, thus enabling a comprehensive approach that combines different media (Twitter, Facebook, general websites, academia), information domains (health, scholar, media), information types (news, claims) and applications (information retrieval, veracity evaluation).

The processing of the claims included an extensive de-duplication process eliminating repeated or very similar claims. The dataset is presented in a LARGE and a SMALL version, accounting for different degrees of similarity between the remaining claims (excluding respectively claims with a 90% and 99% probability of being similar, as obtained through the MonoT5 model). The similarity of claims was analysed using BM25 (Robertson et al., 1995; Crestani et al., 1998; Robertson and Zaragoza, 2009) with MonoT5 re-ranking (Nogueira et al., 2020), and BERTScore (Zhang et al., 2019).

The processing of the content also involved removing claims making only a direct reference to existing content in other media (audio, video, photos); automatically obtained content not representing claims; and entries with claims or fact-checking sources in languages other than English.

The claims were analysed to identify types of claims that may be of particular interest, either for inclusion or exclusion depending on the type of analysis. The following types were identified: (1) Multimodal; (2) Social media references; (3) Claims including questions; (4) Claims including numerical content; (5) Named entities, including: PERSON − People, including fictional; ORGANIZATION − Companies, agencies, institutions, etc.; GPE − Countries, cities, states; FACILITY − Buildings, highways, etc. These entities have been detected using a RoBERTa base English model (Liu et al., 2019) trained on the OntoNotes Release 5.0 dataset (Weischedel et al., 2013) using Spacy.

The original labels for the claims have been reviewed and homogenised from the different criteria used by each original fact-checker into the final True and False labels.

The data sources used are:

The CoronaVirusFacts/DatosCoronaVirus Alliance Database. https://www.poynter.org/ifcn-covid-19-misinformation/

CoAID dataset (Cui and Lee, 2020) https://github.com/cuilimeng/CoAID

MM-COVID (Li et al., 2020) https://github.com/bigheiniu/MM-COVID

CovidLies (Hossain et al., 2020) https://github.com/ucinlp/covid19-data

TREC Health Misinformation track https://trec-health-misinfo.github.io/

TREC COVID challenge (Voorhees et al., 2021; Roberts et al., 2020) https://ir.nist.gov/covidSubmit/data.html

The LARGE dataset contains 5,143 claims (1,810 False and 3,333 True), and the SMALL version 1,709 claims (477 False and 1,232 True).

The entries in the dataset contain the following information:

Claim. Text of the claim.

Claim label. The labels are: False, and True.

Claim source. The sources include mostly fact-checking websites, health information websites, health clinics, public institutions sites, and peer-reviewed scientific journals.

Original information source. Information about which general information source was used to obtain the claim.

Claim type. The different types, previously explained, are: Multimodal, Social Media, Questions, Numerical, and Named Entities.

Funding. This work was supported by the UK Engineering and Physical Sciences Research Council (grant no. EP/V048597/1, EP/T017112/1). ML and YH are supported by Turing AI Fellowships funded by the UK Research and Innovation (grant no. EP/V030302/1, EP/V020579/1).

References

Arana-Catania M., Kochkina E., Zubiaga A., Liakata M., Procter R., He Y.. Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims. NAACL 2022 https://arxiv.org/abs/2205.02596

Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at trec-3. Nist Special Publication Sp,109:109.

Fabio Crestani, Mounia Lalmas, Cornelis J Van Rijsbergen, and Iain Campbell. 1998. “is this document relevant?. . . probably” a survey of probabilistic models in information retrieval. ACM Computing Surveys (CSUR), 30(4):528–552.

Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.

Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pre-trained sequence-to-sequence model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 708–718.

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23.

Limeng Cui and Dongwon Lee. 2020. Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885.

Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. Mm-covid: A multilingual and multimodal data repository for combating covid-19 disinformation.

Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte, Yoshitomo Matsubara, Sean Young, and Sameer Singh. 2020. COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. Association for Computational Linguistics.

Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2021. Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.
f
List of the image queries
figshare.com
zip
Updated Mar 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yiltan Bitirim (2022). List of the image queries [Dataset]. http://doi.org/10.6084/m9.figshare.12336275.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12336275.v1
Dataset updated
Mar 17, 2022
Dataset provided by
figshare
Authors
Yiltan Bitirim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The image queries were used in the following studies:* Y. Bitirim, S. Bitirim, D. Ç. Ertuğrul and Ö. Toygar, “An Evaluation of Reverse Image Search Performance of Google”, 2020 IEEE 44th Annual Computer Software and Applications Conference (COMPSAC), pp. 1368-1372, IEEE, Madrid, Spain, July 2020. (DOI: 10.1109/COMPSAC48688.2020.00-65)** Y. Bitirim, “Retrieval Effectiveness of Google on Reverse Image Search”, Journal of Imaging Science and Technology, Vol. 66, No. 1, pp. 010505-1-010505-6, January 2022. (DOI: 10.2352/J.ImagingSci.Technol.2022.66.1.010505)
Data from: Balanced binary tree code for scientific applications
search.datacite.org
elsevier.digitalcommonsdata.com
Updated Dec 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S.C. Park (2019). Balanced binary tree code for scientific applications [Dataset]. http://doi.org/10.17632/stnhs36bc5
Explore at:
Unique identifier
https://doi.org/10.17632/stnhs36bc5
Dataset updated
Dec 5, 2019
Dataset provided by
DataCitehttps://www.datacite.org/
Mendeley
Authors
S.C. Park
License
https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-licensehttps://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license
Description
Abstract A set of easy-to-use FORTRAN routines for building and accessing data structures of the type commonly encountered in scientific applications is introduced. Fetch and insert times go as ≻ [log (n)], where n is the number of elements in the list. The routines implement AVL or height-balanced binary tree logic. Each tree is a linear integer array. The first ten elements of a tree array specify its structure and the remaining elements are dedicated to node information. Each node includes key and ... Title of program: BBTREE Catalogue Id: ABJR_v1_0 Nature of problem Typical scientific programming applications require numerous calls to one or more subroutines. The intermediate results generated by these calls are usually not saved; if the same information is required at a later stage it is simply recalculated. While wasteful of cpu power, this modus operandi is attractive because it spares the user the time and effort associated with the development of complicated data storage and retrieval algorithms. However, if the number of redundant calls to a particula ... Versions of this program held in the CPC repository in Mendeley Data ABJR_v1_0; BBTREE; 10.1016/0010-4655(89)90076-3 This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
f
RELISH-Aspire
figshare.com
json
Updated Mar 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sheshera Mysore (2022). RELISH-Aspire [Dataset]. http://doi.org/10.6084/m9.figshare.19425506.v1
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19425506.v1
Dataset updated
Mar 26, 2022
Dataset provided by
figshare
Authors
Sheshera Mysore
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a copy of the RELISH dataset used in the paper "Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity" by Sheshera Mysore, Arman Cohan, Tom Hope. The RELISH dataset was first introduced in Brown et al. 2019. See further details of the paper, how this dataset was compiled, and how it was used: https://github.com/allenai/aspireThe contents of the dataset are as follows: abstracts-relish.jsonl: jsonl file containing the paper-id, abstracts, and titles for the queries and candidates which are part of the dataset.

relish-queries-release.csv: Metadata associated with every query.test-pid2anns-relish.json: JSON file with the query paper-id, candidate paper-ids for every query paper in the dataset. Use these files in conjunction with abstracts-relish.jsonl to generate files for use in model evaluation. relish-evaluation_splits.json: Paper-ids for the splits to use in reporting evaluation numbers. aspire/src/evaluation/ranking_eval.py included in the github repo accompanying this dataset implements the evaluation protocol and computes evaluation metrics. Please see the paper for descriptions of the experimental protocol we recommend to report evaluation metrics.
h
text-embedding-dataset
huggingface.co
Updated Feb 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ProfessorBob (2024). text-embedding-dataset [Dataset]. https://huggingface.co/datasets/ProfessorBob/text-embedding-dataset
Explore at:
Dataset updated
Feb 12, 2024
Dataset authored and provided by
ProfessorBob
Description
Text embedding Datasets

The text embedding datasets consist of several (query, passage) paired datasets aiming for text-embedding model finetuning. These datasets are ideal for developing and testing algorithms in the fields of natural language processing, information retrieval, and similar applications.

Dataset Details

Each dataset in this collection is structured to facilitate the training and evaluation of text-embedding models. The datasets are diverse, covering… See the full description on the dataset page: https://huggingface.co/datasets/ProfessorBob/text-embedding-dataset.
D
Data from: Voice your Opinion! Young Voters’ Usage and Perceptions of a...
dataverse.nl
Updated Aug 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naomi Kamoen; Naomi Kamoen; Christine Liebrecht; Christine Liebrecht (2022). Voice your Opinion! Young Voters’ Usage and Perceptions of a Text-based, Voice-based and Text-Voice combined Conversational Agent Voting Advice Application (CAVAA) [Dataset]. http://doi.org/10.34894/MNMLAT
Explore at:
mp4(15303180), pdf(462337), pdf(118719), application/x-spss-sav(64615), application/x-spss-sav(23750)Available download formats
Unique identifier
https://doi.org/10.34894/MNMLAT
Dataset updated
Aug 30, 2022
Dataset provided by
DataverseNL
Authors
Naomi Kamoen; Naomi Kamoen; Christine Liebrecht; Christine Liebrecht
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
In this dataset the data regarding the research on text vs voice vs combined conversational agent voting advice applications can be found. Conversational Agent Voting Advice Applications (CAVAAs) have been proven to be valuable information retrieval systems for citizens who aim to obtain a voting advice based on their answers to political attitude statements but desire additional on-demand information about the political issues first by using a chatbot functionality. Research on CAVAAs is relatively young and in previous studies only the effects of textual CAVAAs has been examined. In light of the positive effects of these tools found in earlier studies, we compared different modalities in which information can be requested to further optimize the design of these information retrieval systems. In an experimental study (N = 60), three CAVAA modalities (text, voice, or a combination of text and voice) were compared on tool evaluation measures (ease of use, usefulness, and enjoyment), political measures (perceived and factual political knowledge), and usage measures (the amount of information retrieved from the chatbot and miscommunication). Results show that the textual and combined CAVAA outperformed the voice CAVAA on several aspects: the voice CAVAA received lower ease of use and usefulness scores, respondents requested less additional information and they experienced more miscommunication when interacting with the chatbot. Furthermore, given the fact that the predefined buttons were predominantly used and stimulated users to request also more and different types of information, it can be concluded that CAVAAs should make information accessible in an easy way to to play into CAVAA users’ processing mode of low elaboration.
s
Citation Trends for "Why are online catalogs hard to use? Lessons learned...
shibatadb.com
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2025). Citation Trends for "Why are online catalogs hard to use? Lessons learned from information-retrieval studies" [Dataset]. https://www.shibatadb.com/article/c8uHPFij
Explore at:
Dataset updated
Aug 6, 2025
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Time period covered
1987 - 2025
Variables measured
New Citations per Year
Description
Yearly citation counts for the publication titled "Why are online catalogs hard to use? Lessons learned from information-retrieval studies".

Facebook

Twitter

Click to copy link

Link copied

Cite

Patents (2022). PRIVATE Patent Application Information Retrieval (PAIR) [Dataset]. https://catalog.data.gov/dataset/private-patent-application-information-retrieval-pair

PRIVATE Patent Application Information Retrieval (PAIR)

Explore at:

Dataset updated

Jul 15, 2022

Dataset provided by

Patents

Description

Offers exclusive access to patent application status information for unpublished patent applications only to the applicant/inventor or his/her representative(s). Private PAIR includes bibliographic, patent term adjustments, continuity data, foreign priority, and address & attorney/agent information from the Patent Application Locating and Monitoring (PALM) System; PDF images of documents (including correspondence) and a transaction history from the Content Management System (CMS) (formerly the Image File Wrapper (IFW) System); and fee information from the Fee Processing Next Generation (FPNG) System. Search is by application number (with or without the two-digit series code), control number, or Patent Cooperation Treaty (PCT) number. Private PAIR requires users to establish a USPTO.gov account and customer number, and establish a password. For more information about establishing a USPTO.gov account and customer number: https://www.uspto.gov/patents-application-process/applying-online/getting-started-new-users Unavailable during database backups (Saturday, Tuesday, and Thursday from 04:30 - 04:45 AM U.S. Eastern Time and Sunday 00:01 - 04:00 AM U.S. Eastern Time. Updated daily. https://ppair-my.uspto.gov/pair/PrivatePair

Clear search

Close search

Google apps

Main menu

PRIVATE Patent Application Information Retrieval (PAIR)

Models and Data for Simple Applications of BERT for Ad Hoc Document...

Computer-Assisted Information Retrieval Service System for Music

TREC 2022 NeuCLIR Dataset

DSD - Document Retrieval Applications

Data from: Information retrieval in linked data: A model based on concept...

Data from: Process of search and retrieval of information in organizational...

Document Management and Retrieval System Report

Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...

Global Search Engine Market Research Report By Product Type (Paid Search,...

Development of a Domain Ontology to Support Information Retrieval on the...

Use of a Clinical Evidence Technology for Skin Disease in Primary Care:...

USPTO Patent Examination Research Data (PatEx)

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Content

Acknowledgements

Data from: PANACEA dataset - Heterogeneous COVID-19 Claims

List of the image queries

Data from: Balanced binary tree code for scientific applications

RELISH-Aspire

text-embedding-dataset

Data from: Voice your Opinion! Young Voters’ Usage and Perceptions of a...

Citation Trends for "Why are online catalogs hard to use? Lessons learned...

PRIVATE Patent Application Information Retrieval (PAIR)See More Versions

PRIVATE Patent Application Information Retrieval (PAIR)