Collection of scholarly articles about COVID-19 and coronavirus family of viruses for use by global research community. Dataset is updated on weekly basis.
The COVID-19 Open Research Dataset is an extensive machine-readable resource of over 45,000 scholarly articles, including over 33,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community. This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease.
The dataset is updated weekly and contains all COVID-19 and coronavirus-related research (e.g., SARS, MERS) from the following sources: PubMed's PMC open access corpus (using this query: COVID-19 and coronavirus research), additional COVID-19 research articles from a corpus maintained by the World Health Organization (WHO), and bioRxiv and medRxiv pre-prints (using this query: COVID-19 and coronavirus research). Also available is a comprehensive metadata file of 44,000 coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic, and the WHO COVID-19 database of publications (includes articles without open access full text).
This GitHub repository contains a downloadable snapshot of National Institute of Standards and Technology's COVID-19 Data Repository, curated from the COVID-19 Open Research Dataset (CORD-19) provided by the Allen Institute for AI. Curated Archive for Covid-19 Research Challenge Dataset- The COVID-19 Data Repository provides searchable CORD-19 data and metadata, including full-text extracted from the original CORD-19 JavaScript Object Notation (JSON) files. It is built using the Configurable Data Curation System (CDCS) developed at NIST.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COVID-19++ is a citation-aware COVID-19 dataset for the analysis of research dynamics. In addition to primary COVID-19 related articles and preprints from 2020, it includes citations and the metadata of first-order cited work. All publications are annotated with MeSH terms, either from the ground truth, or via ConceptMapper, if no ground truth was available.
The data is organized in CSV files
Paper metadata (paper_id, publdate, title, data_source): paper.csv
Annotation data, mapping paper_id to MeSH terms: annotation.csv
Authorship data, mapping paper_id to author, optionally with ORCID: authorship.csv
Paired DOIs of citing and cited papers: references.csv
The column data source within the paper metadata has the value KE (for metadata from ZB MED KE), PP (for preprints) or CR (for cited resources from CrossRef)
This work was supported by BMBF within the programme ``Quantitative Wissenschaftsforschung'' under grant numbers 01PU17013A, 01PU17013B, 01PU17013C.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset from Dimensions.ai contains all published articles, preprints, clinical trials, grants and research datasets that are related to COVID-19. This growing collection of research information now amounts to hundreds of thousands of items, and it is the only dataset of its kind. You can find an overview of the content in this interactive Data Studio dashboard: https://reports.dimensions.ai/covid-19/ The full metadata includes the researchers and organizations involved in the research, as well as abstracts, open access status, research categories and much more. You may wish to use the Dimensions web application to explore the dataset: https://covid-19.dimensions.ai/. This dataset is for researchers, universities, pharmaceutical & biotech companies, politicians, clinicians, journalists, and anyone else who wishes to explore the impact of the current COVID-19 pandemic. It is updated daily, and free for anyone to access. Please share this information with anyone you think would benefit from it. If you have any suggestions as to how we can improve our search terms to maximise the volume of research related to COVID-19, please contact us at support@dimensions.ai. About Dimensions: Dimensions is the largest database of research insight in the world. It contains a comprehensive collection of linked data related to the global research and innovation ecosystem, all in a single platform. This includes hundreds of millions of publications, preprints, grants, patents, clinical trials, datasets, researchers and organizations. Because Dimensions maps the entire research lifecycle, you can follow academic and industry research from early stage funding, through to output and on to social and economic impact. This Covid-19 dataset is a subset of the full database. The full Dimensions database is also available on BigQuery, via subscription. Please visit www.dimensions.ai/bigquery to gain access.Más información
https://zenodo.org/record/3813567/files/COVID.DATA.LIC.AGMT.pdfhttps://zenodo.org/record/3813567/files/COVID.DATA.LIC.AGMT.pdf
Important: This dataset is updated regularly and the latest version for download can be found here: https://www.semanticscholar.org/cord19/download. In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles, including full text content, about COVID-19 and the coronavirus family of viruses for use by the global research community. This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others. By downloading this dataset you are agreeing to the Dataset license. Specific licensing information for individual articles in the dataset is available in the metadata file. Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website. Dataset content: Commercial use subset Non-commercial use subset PMC custom license subset bioRxiv/medRxiv subset (pre-prints that are not peer reviewed) Metadata file Readme Each paper is represented as a single JSON object (see schema file for details). Description: The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources: PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research) Additional COVID-19 research articles from a corpus maintained by the WHO bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research) We also provide a comprehensive metadata file of coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text). We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available. This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar. A coalition including the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service. Citation: When including CORD-19 data in a publication or redistribution, please cite our arXiv pre-print. The Allen Institute for AI and particularly the Semantic Scholar team will continue to provide updates to this dataset as the situation evolves and new research is released.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Data products about the epidemiological, social and economic dimensions of the outbreak. Includes datasets, dashboards, statistics, analyses, trends, charts and maps. Also includes a list of locations where people may have been exposed to the virus.
The DIRECCT study is a multi-phase, living examination of clinical trial results dissemination throughout the COVID-19 pandemic. This dataset contains trials
, registrations
, and results
from Phase 1 of the project, examining trials completed during the first six months of the pandemic (i.e., through 30 June 2020). This dataset is provided as a relational database of three CSVs which can joined on the id
column. Data was collected using a combination of automated and manual strategies; automated searches were performed on 30 June 2020, and manual searches were performed between 21 October 2020 and 18 January 2021. Data sources for trials
and registrations
include the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) list of registered COVID-19 studies, individual clinical trial registries, and the COVID-19 TrialsTracker (https://covid19.trialstracker.net/). Data sources for results
include COVID-19 Open Research Dataset Challenge (CORD-19), PubMed, EuropePMC, Google Scholar, and Google. Additional information on the project is available at the project's OSF page: http://doi.org/10.17605/osf.io/5f8j2
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset provides the data underlying the scientific article "Researchers’ willingness and ability to openly share their research data: a survey of COVID-19 pandemic-related factors". The abstract of the article is as follows: While previous studies show that the drivers and inhibitors for openly sharing research data are diverse and complex, there is a lack of studies empirically examining the influence of the COVID-19 pandemic on researchers’ open data sharing behavior. Using a questionnaire (n=135), this study investigates the influence of COVID-19 pandemic-related factors on researchers’ willingness and ability to openly share their research data. Fifty-one respondents (37.8%) stated that factors related to the COVID-19 pandemic increased their willingness and ability to openly share their research data, while 80 (59.3%) reported that various pandemic-related factors did not influence their willingness and ability in this way. As one of the possible influencing factors, this study finds a significant association between the COVID-19-relatedness of researchers’ research discipline and whether or not the COVID-19 pandemic led to a change in their willingness and ability to share their research data openly: χ2 (1) = 5.77, p < .05. Social influences on open data sharing behavior, institutional support for open data sharing, and the fear of potential negative consequences of open data sharing were nearly similar for the respondents who were and were not involved in COVID-19-related research. This study contributes scientifically by going beyond conceptual studies as it provides empirically-based insights concerning the influence of COVID-19 pandemic-related factors on researchers’ willingness and ability to openly share their data. As a practical contribution, this study discusses recommendations that policymakers can use to sustainably support open research data sharing in post-COVID-19 times.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The spreadsheets in the present dataset (CSV format) include the anonymised responses to our online survey of signatories of the Joint Statement on open research and data sharing. Responses have been split into quantitative responses (i.e., closed survey questions) and qualitative responses (i.e., free text survey questions).
This data has been used to inform our final report, which is available in our Zenodo Project Community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction: During the coronavirus pandemic, changes in the way science is done and shared occurred, which motivates meta-research to help understand science communication in crises and improve its effectiveness. Objective: To study how many Spanish scientific papers on COVID-19 published during 2020 share their research data. Methodology: Qualitative and descriptive study applying nine attributes: (1) availability, (2) accessibility, (3) format, (4) licensing, (5) linkage, (6) funding, (7) editorial policy, (8) content and (9) statistics. Results: We analyzed 1340 papers, 1173 (87.5%) did not have research data. 12.5% share their research data of which 2.1% share their data in repositories, 5% share their data through a simple request, 0.2% do not have permission to share their data and 5.2% share their data as supplementary material. Conclusions: There is a small percentage that shares their research data, however it demonstrates the researchers' poor knowledge on how to properly share their research data and their lack of knowledge on what is research data.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background This bibliometric analysis examines the top 50 most-cited articles on COVID-19 complications, offering insights into the multifaceted impact of the virus. Since its emergence in Wuhan in December 2019, COVID-19 has evolved into a global health crisis, with over 770 million confirmed cases and 6.9 million deaths as of September 2023. Initially recognized as a respiratory illness causing pneumonia and ARDS, its diverse complications extend to cardiovascular, gastrointestinal, renal, hematological, neurological, endocrinological, ophthalmological, hepatobiliary, and dermatological systems. Methods Identifying the top 50 articles from a pool of 5940 in Scopus, the analysis spans November 2019 to July 2021, employing terms related to COVID-19 and complications. Rigorous review criteria excluded non-relevant studies, basic science research, and animal models. The authors independently reviewed articles, considering factors like title, citations, publication year, journal, impact factor, authors, study details, and patient demographics. Results The focus is primarily on 2020 publications (96%), with all articles being open-access. Leading journals include The Lancet, NEJM, and JAMA, with prominent contributions from Internal Medicine (46.9%) and Pulmonary Medicine (14.5%). China played a major role (34.9%), followed by France and Belgium. Clinical features were the primary study topic (68%), often utilizing retrospective designs (24%). Among 22,477 patients analyzed, 54.8% were male, with the most common age group being 26–65 years (63.2%). Complications affected 13.9% of patients, with a recovery rate of 57.8%. Conclusion Analyzing these top-cited articles offers clinicians and researchers a comprehensive, timely understanding of influential COVID-19 literature. This approach uncovers attributes contributing to high citations and provides authors with valuable insights for crafting impactful research. As a strategic tool, this analysis facilitates staying updated and making meaningful contributions to the dynamic field of COVID-19 research. Methods A bibliometric analysis of the most cited articles about COVID-19 complications was conducted in July 2021 using all journals indexed in Elsevier’s Scopus and Thomas Reuter’s Web of Science from November 1, 2019 to July 1, 2021. All journals were selected for inclusion regardless of country of origin, language, medical speciality, or electronic availability of articles or abstracts. The terms were combined as follows: (“COVID-19” OR “COVID19” OR “SARS-COV-2” OR “SARSCOV2” OR “SARS 2” OR “Novel coronavirus” OR “2019-nCov” OR “Coronavirus”) AND (“Complication” OR “Long Term Complication” OR “Post-Intensive Care Syndrome” OR “Venous Thromboembolism” OR “Acute Kidney Injury” OR “Acute Liver Injury” OR “Post COVID-19 Syndrome” OR “Acute Cardiac Injury” OR “Cardiac Arrest” OR “Stroke” OR “Embolism” OR “Septic Shock” OR “Disseminated Intravascular Coagulation” OR “Secondary Infection” OR “Blood Clots” OR “Cytokine Release Syndrome” OR “Paediatric Inflammatory Multisystem Syndrome” OR “Vaccine Induced Thrombosis with Thrombocytopenia Syndrome” OR “Aspergillosis” OR “Mucormycosis” OR “Autoimmune Thrombocytopenia Anaemia” OR “Immune Thrombocytopenia” OR “Subacute Thyroiditis” OR “Acute Respiratory Failure” OR “Acute Respiratory Distress Syndrome” OR “Pneumonia” OR “Subcutaneous Emphysema” OR “Pneumothorax” OR “Pneumomediastinum” OR “Encephalopathy” OR “Pancreatitis” OR “Chronic Fatigue” OR “Rhabdomyolysis” OR “Neurologic Complication” OR “Cardiovascular Complications” OR “Psychiatric Complication” OR “Respiratory Complication” OR “Cardiac Complication” OR “Vascular Complication” OR “Renal Complication” OR “Gastrointestinal Complication” OR “Haematological Complication” OR “Hepatobiliary Complication” OR “Musculoskeletal Complication” OR “Genitourinary Complication” OR “Otorhinolaryngology Complication” OR “Dermatological Complication” OR “Paediatric Complication” OR “Geriatric Complication” OR “Pregnancy Complication”) in the Title, Abstract or Keyword. A total of 5940 articles were accessed, of which the top 50 most cited articles about COVID-19 and Complications of COVID-19 were selected through Scopus. Each article was reviewed for its appropriateness for inclusion. The articles were independently reviewed by three researchers (JRP, MAM and TS) (Table 1). Differences in opinion with regard to article inclusion were resolved by consensus. The inclusion criteria specified articles that were focused on COVID-19 and Complications of COVID-19. Articles were excluded if they did not relate to COVID-19 and or complications of COVID-19, Basic Science Research and studies using animal models or phantoms. Review articles, Viewpoints, Guidelines, Perspectives and Meta-analysis were also excluded from the top 50 most-cited articles (Table 1). The top 50 most-cited articles were compiled in a single database and the relevant data was extracted. The database included: Article Title, Scopus Citations, Year of Publication, Journal, Journal Impact Factor, Authors, Number of Authors, Department Affiliation, Number of Institutions, Country of Origin, Study Topic, Study Design, Sample Size, Open Access, Non-Original Articles, Patient/Participants Age, Gender, Symptoms, Signs, Co-morbidities, Complications, Imaging Modalities Used and outcome.
According to ** percent of the faculty, research funding in the south Asian country of India had decreased during the COVID-19 pandemic in 2020. About ** percent of the research faculty stated that the international research tie-ups also had come down during the pandemic.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coverage of CORD-19 publications by Altmetric.
COVID Symptom Study Sweden collects data through a smartphone app to investigate prevalence, risk factors, and symptoms associated with COVID-19. To date, over 200.000 volunteers have enrolled in the study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PRIEST study used patient data from the early phases of the COVID-19 pandemic. The PRIEST study provided descriptive statistics of UK patients with suspected COVID-19 in an emergency department cohort, analysis of existing triage tools, and derivation and validation of a COVID-19 specific tool for adults with suspected COVID-19. For more details please go to the study website:https://www.sheffield.ac.uk/scharr/research/centres/cure/priestFiles contained in PRIEST study data repository Main files include:PRIEST.csv dataset contains 22445 observations and 119 variables. Data include initial presentation and follow-up, one row per participant.PRIEST_variables.csv contains variable names, values and brief description.Additional files include:Follow-up v4.0 PDF - Blank 30-day follow-up data collection toolPandemic Respiratory Infection Form v7 PDF - Blank baseline data collection toolPRIEST protocol v11.0_17Aug20 PDF - Study protocolPRIEST_SAP_v1.0_19jun20 PDF - Statistical analysis planThe PRIEST data sharing plan follows a controlled access model as described in Good Practice Principles for Sharing Individual Participant Data from Publicly Funded Clinical Trials. Data sharing requests should be emailed to priest-study@sheffield.ac.uk. Data sharing requests will be considered carefully as to whether it is necessary to fulfil the purpose of the data sharing request. For approval of a data sharing request an approved ethical review and study protocol must be provided. The PRIEST study was approved by NRES Committee North West - Haydock. REC reference: 12/NW/0303
Parallel to the dataset CORD-19 of scholarly articles, we provide the literature graph LG-covid19-HOTP composed of not only articles (graph nodes) that are relevant to the study of coronavirus, but also in and out citation links (directed graph edges) to base navigation and search among the articles. The article records are related and connected, not isolated. The graph has been updated weekly since March 26, 2020. The current graph includes 28,669 hot-off-the-press (HOTP) articles since January 2020. It contains 402,946 articles and 3,604,234 links. The link-to-node ratio is remarkably higher than some other existing literature graphs. In addition to the dataset we provide more functionalities at lg-covid-19-hotp.cs.duke.edu such as new articles, weekly meta-data analysis in terms of publication growth over time, ranking by citation, and statistical near-neighbor embedding maps by similarity in co-citation, and similarity in co-reference. Since April 11, we have enabled a novel functionality - self-navigated surf-search over the maps. At the site we also take courtesy input of COVID-19 articles that are missing from the current collection. {"references": ["Semantic Scholar Open Research Corpus. 2019. Version 2019-11-01. Retrieved from http://s2-public-api-prod.us-west-2.elasticbeanstalk.com/corpus/download/. Accessed 2019-12-06.", "Elsevier Scopus Citation Overview API. Accessed 2020-03-25.", "COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-03-20. Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed 2020-03-26. 10.5281/zenodo.3727291", "Crossref REST API. Available at www.crossref.org. Accessed 2020-03-25."]}
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
DECOVID, a multi-centre research consortium, was founded in March 2020 by two United Kingdom (UK) National Health Service (NHS) Foundation Trusts (comprising three acute care hospitals) and three research institutes/universities: University Hospitals Birmingham (UHB), University College London Hospitals (UCLH), University of Birmingham, University College London and The Alan Turing Institute. The original aim of DECOVID was to share harmonised electronic health record (EHR) data from UCLH and UHB to enable researchers affiliated with the DECOVID consortium to answer clinical questions to support the COVID-19 response. The DECOVID database has now been placed within the infrastructure of PIONEER, a Health Data Research (HDR) UK funded data hub that contains data from acute care providers, to make the DECOVID database accessible to external researchers not affiliated with the DECOVID consortium.
This highly granular dataset contains 256,804 spells and 165,414 hospitalised patients. The data includes demographics, serial physiological measurements, laboratory test results, medications, procedures, drugs, mortality and readmission.
Geography: UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UCLH provides first-class acute and specialist services in six hospitals in central London, seeing more than 1 million outpatient and 100,000 admissions per year. Both UHB and UCLH have fully electronic health records. Data has been harmonised using the OMOP data model. Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in other common data models and can build synthetic data to meet bespoke requirements.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To download this dataset without purchasing an IEEE Dataport subscription
https://www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/approvedresearcherschemehttps://www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/approvedresearcherscheme
The Public Health Research Database (PHRD) is a linked asset which currently includes Census 2011 data; Mortality Data; Hospital Episode Statistics (HES); GP Extraction Service (GPES) Data for Pandemic Planning and Research data. Researchers may apply for these datasets individually or any combination of the current 4 datasets.
The purpose of this dataset is to enable analysis of deaths involving COVID-19 by multiple factors such as ethnicity, religion, disability and known comorbidities as well as age, sex, socioeconomic and marital status at subnational levels. 2011 Census data for usual residents of England and Wales, who were not known to have died by 1 January 2020, linked to death registrations for deaths registered between 1 January 2020 and 8 March 2021 on NHS number. The data exclude individuals who entered the UK in the year before the Census took place (due to their high propensity to have left the UK prior to the study period), and those over 100 years of age at the time of the Census, even if their death was not linked. The dataset contains all individuals who died (any cause) during the study period, and a 5% simple random sample of those still alive at the end of the study period. For usual residents of England, the dataset also contains comorbidity flags derived from linked Hospital Episode Statistics data from April 2017 to December 2019 and GP Extraction Service Data from 2015-2019.
Collection of scholarly articles about COVID-19 and coronavirus family of viruses for use by global research community. Dataset is updated on weekly basis.