Facebook
TwitterThe COVID-19 Open Research Dataset is an extensive machine-readable resource of over 45,000 scholarly articles, including over 33,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community. This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease.
The dataset is updated weekly and contains all COVID-19 and coronavirus-related research (e.g., SARS, MERS) from the following sources: PubMed's PMC open access corpus (using this query: COVID-19 and coronavirus research), additional COVID-19 research articles from a corpus maintained by the World Health Organization (WHO), and bioRxiv and medRxiv pre-prints (using this query: COVID-19 and coronavirus research). Also available is a comprehensive metadata file of 44,000 coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic, and the WHO COVID-19 database of publications (includes articles without open access full text).
Facebook
TwitterAttribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
The Covid-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on Covid-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 75K times and has served as the basis of many Covid-19 text mining and discovery systems.
The dataset itself isn't defining a specific task, but there is a Kaggle challenge that define 17 open research questions to be solved with the dataset: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/tasks
Facebook
TwitterThe DIRECCT study is a multi-phase, living examination of clinical trial results dissemination throughout the COVID-19 pandemic. This dataset contains trials, registrations, and results from Phase 1 of the project, examining trials completed during the first six months of the pandemic (i.e., through 30 June 2020). This dataset is provided as a relational database of three CSVs which can joined on the id column. Data was collected using a combination of automated and manual strategies; automated searches were performed on 30 June 2020, and manual searches were performed between 21 October 2020 and 18 January 2021. Data sources for trials and registrations include the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) list of registered COVID-19 studies, individual clinical trial registries, and the COVID-19 TrialsTracker (https://covid19.trialstracker.net/). Data sources for results include COVID-19 Open Research Dataset Challenge (CORD-19), PubMed, EuropePMC, Google Scholar, and Google. Additional information on the project is available at the project's OSF page: http://doi.org/10.17605/osf.io/5f8j2
Facebook
TwitterThis GitHub repository contains a downloadable snapshot of National Institute of Standards and Technology's COVID-19 Data Repository, curated from the COVID-19 Open Research Dataset (CORD-19) provided by the Allen Institute for AI. Curated Archive for Covid-19 Research Challenge Dataset- The COVID-19 Data Repository provides searchable CORD-19 data and metadata, including full-text extracted from the original CORD-19 JavaScript Object Notation (JSON) files. It is built using the Configurable Data Curation System (CDCS) developed at NIST.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COVID-19++ is a citation-aware COVID-19 dataset for the analysis of research dynamics. In addition to primary COVID-19 related articles and preprints from 2020, it includes citations and the metadata of first-order cited work. All publications are annotated with MeSH terms, either from the ground truth, or via ConceptMapper, if no ground truth was available.
The data is organized in CSV files
- Paper metadata (paper_id, publdate, title, data_source): paper.csv
- Annotation data, mapping paper_id to MeSH terms: annotation.csv
- Authorship data, mapping paper_id to author, optionally with ORCID: authorship.csv
- Paired DOIs of citing and cited papers: references.csv
The column data source within the paper metadata has the value KE (for metadata from ZB MED KE), PP (for preprints) or CR (for cited resources from CrossRef)
This work was supported by BMBF within the programme ``Quantitative Wissenschaftsforschung'' under grant numbers 01PU17013A, 01PU17013B, 01PU17013C.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
fastText 300 dimension vectors built against the COVID-19 Open Research Dataset (CORD-19) with minCount=3.
Only processed alphanumeric strings, required at least 1 alpha character and for strings to be at least 2 characters long.
The following stop words were also not processed:
Banner Photo by martinsanchez on Unsplash
Facebook
TwitterA Free, Open Resource for the Global Research Community In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community. This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others. Commercial use subset (includes PMC content) -- 9000 papers, 186Mb Non-commercial use subset (includes PMC content) -- 1973 papers, 36Mb
Facebook
TwitterCollection of scholarly articles about COVID-19 and coronavirus family of viruses for use by global research community. Dataset is updated on weekly basis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets from this article.
Tan, L., and D. M. Schultz, 2022: How is COVID-19 affected by weather? Meta-regression of 158 studies and recommendations for best practices in future research. Wea. Climate Soc., 14, 237–255, https://doi.org/10.1175/WCAS-D-21-0132.1.
Facebook
TwitterAccording to ** percent of the faculty, research funding in the south Asian country of India had decreased during the COVID-19 pandemic in 2020. About ** percent of the research faculty stated that the international research tie-ups also had come down during the pandemic.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Recent AllenNLP released Open COVID-19 research dataset. This dataset contains 44k research papers related to Coronavirus and other diseases. This dataset exposed many challenges for data scientists to explore raw data and provide insights about the virus. BERT and other state-of-the-art models can be used to extract meanings from these raw data. Having a word vector embeddings from BERT can ease many NLP tasks on the original data.
The dataset is just a single npz file which contains the title embeddings of 40k research papers. You can load this as a Numpy array and use it to perform NLP tasks like semantic search, topic modelling, clustering, QA etc.
To planet Earth
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created by Maciej Obarski
Released under Attribution 4.0 International (CC BY 4.0)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The COVID-19 pandemic has brought substantial attention to the systems used to communicate biomedical research. In particular, the need to rapidly and credibly communicate research findings has led many stakeholders to encourage researchers to adopt open science practices such as posting preprints and sharing data. To examine the degree to which this has led to the actual adoption of such practices, we examined the "openness" of a sample of 539 published papers describing the results of randomized controlled trials testing interventions to prevent or treat COVID-19. The majority (56%) of the papers in this sample were free to read at the time of our investigation and 23.56% were preceded by preprints. However, there is no guarantee that the papers without an open license will be available without a subscription in the future, and only 49.61% of the preprints we identified were linked to the subsequent peer-reviewed version. Of the 331 papers in our sample with statements identifying if (and how) related datasets were available, only a paucity indicated that data was available in a repository that facilitates rapid verification and reuse. Our results demonstrate that, while progress has been made, there is still a significant mismatch between aspiration and actual practice in the adoption of open science in an important area of the COVID-19 literature.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
These are peer-reviewed supplementary materials for the article 'COVID-19 clinical trials: who is likely to participate and why?' published in the Journal of Comparative Effectiveness Research.Appendix 1: Research Participation SurveyAppendix 2: Statistical Analyses and ResultsSupplemental Table 1: Logistic Regression Models Predicting Intent to Participate in Hypothetical Research StudyAim: To identify factors associated with willingness to participate in a COVID-19 clinical trial and reasons for and against participating. Materials & methods: We surveyed Massachusetts (MA, USA) residents online using the Dynata survey platform and via phone using random digit dialing between October and November 2021. Respondents were asked to imagine they were hospitalized with COVID-19 and invited to participate in a treatment trial. We assessed willingness to participate by asking, “Which way are you leaning” and why. We used multivariate logistic regression to model factors associated with leaning toward participation. Open-ended responses were analyzed using conventional content analysis. Results: Of 1071 respondents, 65.6% leaned toward participating. Multivariable analyses revealed college education (OR: 1.59; 95% CI: 1.11, 2.27), trust in the healthcare system (OR: 1.32; 95% CI: 1.10, 1.58) and relying on doctors (OR: 1.77; 95% CI: 1.45, 2.17) and family or friends (OR: 1.31; 95% CI: 1.11, 1.54) to make health decisions were significantly associated with leaning toward participating. Respondents with lower health literacy (OR: 0.57; 95% CI: 0.36, 0.91) and who identify as Black (OR: 0.40; 95% CI: 0.24, 0.68), Hispanic (OR: 0.61; 95% CI: 0.38, 0.98), or republican (OR: 0.61; 95% CI: 0.38, 0.97) were significantly less likely to lean toward participating. Common reasons for participating included helping others, benefitting oneself and deeming the study low risk. Common reasons for leaning against were deeming the study high risk, disliking experimental treatments and not wanting to be a guinea pig. Conclusion: Our finding that vulnerable individuals and those with lower levels of trust in the healthcare system are less likely to be receptive to participating in a COVID-19 clinical trial highlights that work is needed to achieve a healthcare system that provides confidence to historically disadvantaged groups that their participation in research will benefit their community.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data that is collected at the individual-level from mobile phones is typically aggregated to the population-level for privacy reasons. If we are interested in answering questions regarding the mean, or working with groups appropriately modeled by a continuum, then this data is immediately informative. However, coupling such data regarding a population to a model that requires information at the individual-level raises a number of complexities. This is the case if we aim to characterize human mobility and simulate the spatial and geographical spread of a disease by dealing in discrete, absolute numbers. In this work, we highlight the hurdles faced and outline how they can be overcome to effectively leverage the specific dataset: Google COVID-19 Aggregated Mobility Research Dataset (GAMRD). Using a case study of Western Australia, which has many sparsely populated regions with incomplete data, we firstly demonstrate how to overcome these challenges to approximate absolute flow of people around a transport network from the aggregated data. Overlaying this evolving mobility network with a compartmental model for disease that incorporated vaccination status we run simulations and draw meaningful conclusions about the spread of COVID-19 throughout the state without de-anonymizing the data. We can see that towns in the Pilbara region are highly vulnerable to an outbreak originating in Perth. Further, we show that regional restrictions on travel are not enough to stop the spread of the virus from reaching regional Western Australia. The methods explained in this paper can be therefore used to analyze disease outbreaks in similarly sparse populations. We demonstrate that using this data appropriately can be used to inform public health policies and have an impact in pandemic responses.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The spreadsheets in the present dataset (CSV format) include the anonymised responses to our online survey of signatories of the Joint Statement on open research and data sharing. Responses have been split into quantitative responses (i.e., closed survey questions) and qualitative responses (i.e., free text survey questions).
This data has been used to inform our final report, which is available in our Zenodo Project Community.
Facebook
TwitterAs of February 2025, a total of ** clinical studies targeting COVID-19 in Mexico were in phase *. Meanwhile, ***** COVID-19 clinical trials were in early phase * in the North American country. As of June 3, 2022, there were over ***** drugs and vaccines in development targeting the coronavirus disease (COVID-19) worldwide.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction: During the coronavirus pandemic, changes in the way science is done and shared occurred, which motivates meta-research to help understand science communication in crises and improve its effectiveness. Objective: To study how many Spanish scientific papers on COVID-19 published during 2020 share their research data. Methodology: Qualitative and descriptive study applying nine attributes: (1) availability, (2) accessibility, (3) format, (4) licensing, (5) linkage, (6) funding, (7) editorial policy, (8) content and (9) statistics. Results: We analyzed 1340 papers, 1173 (87.5%) did not have research data. 12.5% share their research data of which 2.1% share their data in repositories, 5% share their data through a simple request, 0.2% do not have permission to share their data and 5.2% share their data as supplementary material. Conclusions: There is a small percentage that shares their research data, however it demonstrates the researchers' poor knowledge on how to properly share their research data and their lack of knowledge on what is research data.
Facebook
TwitterNEW in Version 18: Besides our regular update, we now have included the tweet identifiers and their respective tweet location place country code for the clean version of the dataset. This is found on the clean_place_country.tar.gz file, each file is identified by the two-character ISO country code as the file suffix.
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. This is found on the clean_languages.tar.gz file, each file is identified by the two-character language code as the file suffix.
The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (490,385,226 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (120,722,431 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/
More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688)
As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. The need to be hydrated to be used.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains files to assist in building question-answer models for the CORD-19 dataset.
Files included:
cord19.txt: Line-by-line export of CORD-19 data with a focus on high quality, study design detected articles. cord19-qa.csv: CSV rows of question, context, answer combinations for the CORD-19 dataset cord19-qa.json: SQuAD 2.0 formatted question, context, answer combinations
Transformer models fine-tuned for language modeling, SQuAD 2.0 and this dataset can be used within HuggingFace Transformers.
Banner Photo Jeremy Thomas on Unsplash
Facebook
TwitterThe COVID-19 Open Research Dataset is an extensive machine-readable resource of over 45,000 scholarly articles, including over 33,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community. This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease.
The dataset is updated weekly and contains all COVID-19 and coronavirus-related research (e.g., SARS, MERS) from the following sources: PubMed's PMC open access corpus (using this query: COVID-19 and coronavirus research), additional COVID-19 research articles from a corpus maintained by the World Health Organization (WHO), and bioRxiv and medRxiv pre-prints (using this query: COVID-19 and coronavirus research). Also available is a comprehensive metadata file of 44,000 coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic, and the WHO COVID-19 database of publications (includes articles without open access full text).