This repository within the ACTIV TRACE initiative houses a comprehensive collection of datasets related to SARS-CoV-2. The processing of SARS-CoV-2 Sequence Read Archive (SRA) files has been optimized to identify genetic variations in viral samples. This information is then presented in the Variant Call Format (VCF). Each VCF file corresponds to the SRA parent-run's accession ID. Additionally, the data is available in the parquet format, making it easier to search and filter using the Amazon Athena Service. The SARS-CoV-2 Variant Calling Pipeline is designed to handle new data every six hours, with updates to the AWS ODP bucket occurring daily.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tentative de classement du nombre de cas confirmés du virus SARS-CoV-2 sur le territoire français par régions. Les données sont issues des médias et des sites de santé de l'état. Le but du document soit d'être le plus précis possible. Sources : https://solidarites-sante.gouv.fr/soins-et-maladies/maladies/maladies-infectieuses/coronavirus/article/points-de-situation-coronavirus-covid-19 https://www.santepubliquefrance.fr/maladies-et-traumatismes/maladies-et-infections-respiratoires/infection-a-coronavirus/articles/infection-au-nouveau-coronavirus-sars-cov-2-covid-19-france-et-monde https://france3-regions.francetvinfo.fr https://www.ars.sante.fr/ https://www.facebook.com/MinSoliSante/ https://geodes.santepubliquefrance.fr/#c=home Autres sources sur data.gouv.fr intéressantes : https://www.data.gouv.fr/fr/datasets/chiffres-cles-concernant-lepidemie-de-covid19-en-france/ https://www.data.gouv.fr/fr/reuses/visualisation-et-analyse-covid-19-monde-france-regions-francaises/ https://www.data.gouv.fr/fr/reuses/tableau-de-bord-de-suivi-de-lepidemie-de-covid19/ Autres : https://www.arcgis.com/apps/opsdashboard/index.html#/3a278da2d7ab4a8a8e1b4ea8bea7121b https://www.esrifrance.fr/coronavirus-ressources.aspx https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 https://nextstrain.org/ncov
2019 Novel Coronavirus COVID-19 (2019-nCoV) Visual Dashboard and Map:
https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Downloadable data:
https://github.com/CSSEGISandData/COVID-19
Additional Information about the Visual Dashboard:
https://systems.jhu.edu/research/public-health/ncov
As of May 2, 2023, the outbreak of the coronavirus disease (COVID-19) had been confirmed in almost every country in the world. The virus had infected over 687 million people worldwide, and the number of deaths had reached almost 6.87 million. The most severely affected countries include the U.S., India, and Brazil.
COVID-19: background information COVID-19 is a novel coronavirus that had not previously been identified in humans. The first case was detected in the Hubei province of China at the end of December 2019. The virus is highly transmissible and coughing and sneezing are the most common forms of transmission, which is similar to the outbreak of the SARS coronavirus that began in 2002 and was thought to have spread via cough and sneeze droplets expelled into the air by infected persons.
Naming the coronavirus disease Coronaviruses are a group of viruses that can be transmitted between animals and people, causing illnesses that may range from the common cold to more severe respiratory syndromes. In February 2020, the International Committee on Taxonomy of Viruses and the World Health Organization announced official names for both the virus and the disease it causes: SARS-CoV-2 and COVID-19, respectively. The name of the disease is derived from the words corona, virus, and disease, while the number 19 represents the year that it emerged.
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe outbreak of coronavirus disease (COVID-19) poses a great threat to global public health. At present, the number of newly confirmed COVID-19 cases and deaths is increasing worldwide. The strategy of comprehensive and scientific detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) through quantitative real-time polymerase chain reaction (qRT-PCR) for special populations and environments provides great support for the prevention and control of this pandemic in China. Our study focused on determining the factors associated with the length of time from symptom onset to the first positive nucleic acid test of throat swabs in COVID-19 patients, evaluating the effect of early positive nucleic acid detection on the disease severity and its significance in prognosis, and predicting the factors associated with the time from positive SARS-CoV-2 RNA test to negative conversion (negative conversion of SARS-CoV-2 virus) in COVID-19 patients.MethodsThis study included 116 hospitalized patients with COVID-19 from January 30, 2020 to March 4, 2020 in Wuhan, China. Throat swab samples were collected for qRT-PCR testing of SARS-CoV-2 RNA, and all patients included in this study were positive for this test.ResultsThe multivariate Cox proportional hazards model showed that disease severity (HR = 0.572; 95% CI 0.348–0.942; p = 0.028) was a protective factor for the time from symptom onset to positive nucleic acid detection. Meanwhile, the time from symptom onset to positive nucleic acid detection (HR = 1.010; 95% CI 1.005–1.020; p = 0.0282) was an independent risk factor for the delay in negative conversion time of SARS-CoV-2 virus. However, the severity of the disease (HR=1.120; 95% CI 0.771–1.640; p = 0.544) had no correlation with the negative conversion time of SARS-CoV-2 virus.ConclusionsPatients with more severe disease had a shorter time from symptom onset to a positive nucleic acid test. Prolonged time from symptom onset to positive nucleic acid test was an independent risk factor for the delay in negative conversion time of SARS-CoV-2 virus, and the severity of the disease had no correlation with negative conversion time of SARS-CoV-2 virus.
A centralized repository of up-to-date and curated datasets on or related to the spread and characteristics of the novel corona virus (SARS-CoV-2) and its associated illness, COVID-19. Globally, there are several efforts underway to gather this data, and we are working with partners to make this crucial data freely available and keep it up-to-date. Hosted on the AWS cloud, we have seeded our curated data lake with COVID-19 case tracking data from Johns Hopkins and The New York Times, hospital bed availability from Definitive Healthcare, and over 45,000 research articles about COVID-19 and related coronaviruses from the Allen Institute for AI.
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.
Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.
This case surveillance public use dataset has 19 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors.
Currently, CDC provides the public with three versions of COVID-19 case surveillance line-listed data: this 19 data element dataset with geography, a 12 data element public use dataset, and a 33 data element restricted access dataset.
The following apply to the public use datasets and the restricted access dataset:
Overview
The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (<a href="https://cdn.ymaws.com/www.cste.org/resource/resmgr/ps/positionstatement2020/Interim-20-ID-01_COVID
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) also known as coronavirus, is the virus that causes COVID-19, responsible for the ongoing pandemic. Since the first SARS-Cov-2 genome was made available several new variants have been found and sequenced. The following is a small attempt to get a better understanding of the SARS-Cov-2 variants.
The data set contains the normalized frequency of several K-mers in the SARS-Cov-2 sequence. Headers indicate the K-mer and the id the NCBI identifier.
The original sequences were downloaded from the NCBI SARS-CoV-2 Resources page, only correctly sequenced variants were selected to create the data.
Photo by Fusion Medical Animation on Unsplash
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The COVID Tracking Project was a volunteer organization launched from The Atlantic and dedicated to collecting and publishing the data required to understand the COVID-19 outbreak in the United States. Our dataset was in use by national and local news organizations across the United States and by research projects and agencies worldwide.
Every day, we collected data on COVID-19 testing and patient outcomes from all 50 states, 5 territories, and the District of Columbia by visiting official public health websites for those jurisdictions and entering reported values in a spreadsheet. The files in this dataset represent the entirety of our COVID-19 testing and outcomes data collection from March 7, 2020 to March 7, 2021. This dataset includes official values reported by each state on each day of antigen, antibody, and PCR test result totals; the total number of probable and confirmed cases of COVID-19; the number of people currently hospitalized, in intensive care, and on a ventilator; the total number of confirmed and probable COVID-19 deaths; and more.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clinical, laboratory, and radiological features of COVID-19.
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
For English, see below
Nederland heeft voor het SARS-CoV-2 virus (coronavirus) een endemische fase bereikt en de GGD teststraten zijn per 17 maart 2023 gesloten. Daardoor wordt de data vanaf 1 april 2023 niet meer bijgewerkt.
Bestand vanaf week 40, 2021: COVID-19_casus_landelijk Bestand tot en met week 39, 2021: COVID-19_casus_landelijk_tm Dit bestand wordt vanaf versie 5 niet meer geüpdatet (zie hieronder)
Beschikbare formaten: .csv en .json Bronsysteem: OSIRIS Algemene Infectieziekten (AIZ)
Beschrijving bestand: Dit bestand bevat de volgende karakteristieken per positief geteste casus in Nederland: Datum voor statistiek, Leeftijdsgroep, Geslacht, Overlijden, Week van overlijden, Provincie, Meldende GGD
Het bestand is als volgt opgebouwd: Een record voor elke laboratorium bevestigde COVID-19 patiënt in Nederland, sinds het begin van de pandemie. Vanaf 11 juli 2022 is deze data opgesplitst (zie beschrijving versie 5). Alleen het bestand vanaf week 40, 2021 wordt iedere dinsdag en vrijdag om 16:00 ververst, op basis van de gegevens zoals op 10:00 uur die dag geregistreerd staan in het landelijk systeem voor meldingsplichtige infectieziekten (Osiris AIZ). Het historische bestand (tot en met week 39, 2021) wordt vanaf 11 juli niet meer geüpdatet.
Beschrijving van de variabelen: Version: Versienummer van de dataset. Wanneer de inhoud van de dataset structureel wordt gewijzigd (dus niet de dagelijkse update of een correctie op record niveau), zal het versienummer aangepast worden (+1) en ook de corresponderende metadata in RIVMdata (https://data.rivm.nl). Versie 2 update (20 januari 2022): - In versie 2 van deze dataset is de variabele ‘hospital_admission’ niet meer beschikbaar. Voor het aantal ziekenhuisopnames wordt verwezen naar de geregistreerde ziekenhuisopnames van Stichting NICE (data.rivm.nl/covid-19/COVID-19_ziekenhuisopnames.html). Versie 3 update (8 februari 2022) - Vanaf 8 februari 2022 worden de positieve SARS-CoV-2 testuitslagen rechtstreeks vanuit CoronIT aan het RIVM gemeld. Ook worden de testuitslagen van andere testaanbieders (zoals Testen voor Toegang) en zorginstellingen (zoals ziekenhuizen, verpleeghuizen en huisartsen) die hun positieve SARS-CoV-2 testuitslagen via het Meldportaal van GGD GHOR invoeren rechtstreeks aan het RIVM gemeld. Meldingen die onderdeel zijn van de bron- en contactonderzoek steekproef en positieve SARS-CoV-2 testuitslagen van zorginstellingen die via zorgmail aan de GGD worden gemeld worden wel via HPZone aan het RIVM gemeld. Vanaf 8 februari wordt de datum van de positieve testuitslag gebruikt en niet meer de datum van melding aan de GGD Versie 4 update (24 maart 2022): - In versie 4 van deze dataset zijn records samengesteld volgens de gemeente herindeling van 24 maart 2022. Zie beschrijving van de variabele Municipal_health_service voor meer informatie. Versie 5 update (11 juli 2022): - Vanaf 11 juli 2022 is deze dataset opgesplitst in twee delen. Het eerste deel bevat de data vanaf het begin van de pandemie tot en met 3 oktober 2021 (week 39) en bevat ‘tm’ in de bestandsnaam. Deze data wordt niet meer geüpdatet. Het tweede deel bevat de data vanaf 4 oktober 2021 (week 40) en wordt iedere werkdag geüpdatet. Versie 6 update (1 september 2022): - Vanaf 1 september 2022 wordt het tweede deel van de data (vanaf week 40 2021) niet meer iedere werkdag geüpdatet, maar op dinsdagen en vrijdagen. De data wordt op deze dagen met terugwerkende kracht bijgewerkt voor de andere dagen. Versie 7 update (3 januari 2023): - Per 1 januari 2023 verzamelt het RIVM geen aanvullende informatie meer. Dit heeft als gevolg dat we vanaf 1 januari 2023 geen overlijdens meer rapporteren en worden de kolommen [Deceased] en [Week of Death] niet meer aangevuld.
Date_file: Datum en tijd waarop de gegevens zijn gepubliceerd door het RIVM
Date_statistics: Datum voor statistiek; eerste ziektedag, indien niet bekend, datum lab positief, indien niet bekend, melddatum aan GGD (formaat: jjjj-mm-dd)
Date_statistics_type: Soort datum die beschikbaar was voor datum voor de variabele "Datum voor statistiek", waarbij: DOO = Date of disease onset : Eerste ziektedag zoals gemeld door GGD. Let op: het is niet altijd bekend of deze eerste ziektedag ook echt al Covid-19 betrof. DPL = Date of first Positive Labresult : Datum van de (eerste) positieve labuitslag. DON = Date of Notification : Datum waarop de melding bij de GGD is binnengekomen.
Agegroup: Leeftijdsgroep bij leven; 0-9, 10-19, ..., 90+; bij overlijden <50, 50-59, 60-69, 70-79, 80-89, 90+, Unknown = Onbekend
Sex: Geslacht; Unknown = Onbekend, Male = Man, Female = Vrouw
Province: Naam van de provincie (op basis van de verblijfplaats van de patiënt)
Deceased: Overlijden. Unknown = Onbekend, Yes = Ja, No = Nee. Vanaf 1 januari 2023 is deze kolom leeg.
Week of Death : Week van overlijden. YYYYMM volgens ISO-week notatie (start op maandag t/m zondag). Vanaf 1 januari 2023 is deze kolom leeg.
Municipal_health_service: GGD die de melding heeft gedaan. Vanaf 24 maart 2022 is dit bestand samengesteld volgens de gemeente indeling van 24 maart 2022. Gemeente Weesp is opgegaan in gemeente Amsterdam. Met deze indeling is de veiligheidsregio Gooi- en Vechtstreek kleiner geworden en de veiligheidsregio Amsterdam-Amstelland groter; GGD Amsterdam is groter geworden en GGD Gooi- en Vechtstreek is kleiner geworden (https://www.cbs.nl/nl-nl/onze-diensten/methoden/classificaties/overig/gemeentelijke-indelingen-per-jaar/indeling-per-jaar/gemeentelijke-indeling-op-1-januari-2022).
Covid-19 characteristics per case, nationwide
The Netherlands has reached an endemic phase for the SARS-CoV-2 virus (coronavirus) and the PHS testing facilities will be closed as of March 17, 2023. As a result, the data will no longer be updated from 1 April 2023.
File from week 40, 2021: COVID-19_case_landelijk File up to and including week 39, 2021: COVID-19_casus_landelijk_tm This file will no longer be updated from version 5 (see below)
Available formats: .csv and .json Source system: OSIRIS General Infectious Diseases (AIZ)
File description: This file contains the following characteristics per positively tested case in the Netherlands: Date for statistics, Age group, Gender, Death, Week of death, Province, Notifying PHS
The file is structured as follows: A record for every lab-confirmed COVID-19 patient in the Netherlands since the start of the pandemic. From July 11, 2022, this data has been split (see description version 5). Only the file from week 40, 2021 onwards will be updated every Tuesday and Friday at 4:00 PM, based on the data as registered at 10:00 AM that day in the national system for notifiable infectious diseases (Osiris AIZ). The historical file (up to and including week 39, 2021) will no longer be updated from July 11, 2022.
Description of the variables: Version: Version number of the dataset. When the content of the dataset is structurally changed (so not the daily update or a correction at record level), the version number will be adjusted (+1) and also the corresponding metadata in RIVMdata (https://data.rivm.nl). Version 2 update (January 20, 2022): - In version 2 of this dataset, the variable 'hospital_admission' is no longer available. For the number of hospital admissions, reference is made to the registered hospital admissions of the NICE Foundation (data.rivm.nl/covid-19/COVID-19_ziekenhuis Admissions.html). Version 3 update (February 8, 2022) - From 8 February 2022, positive SARS-CoV-2 test results will be reported directly from CoronIT to the RIVM. The test results of other test providers (such as Testing for Access) and healthcare institutions (such as hospitals, nursing homes and general practitioners) that enter their positive SARS-CoV-2 test results via the Reporting Portal of GGD GHOR are also reported directly to the RIVM. Reports that are part of the source and contact investigation sample and positive SARS-CoV-2 test results from healthcare institutions that are reported to the PHS via healthcare email are reported to the RIVM via HPZone. From 8 February 2022, the date of the positive test result is used and no longer the date of notification to the PHS. Version 4 update (March 24, 2022): - In version 4 of this dataset, records are compiled according to the municipality reclassification of March 24, 2022. See description of the Municipal_health_service variable for more information. Version 5 Update (July 11, 2022): - As of July 11, 2022, this dataset is split into two parts. The first part contains the dates from the start of the pandemic to October 3, 2021 (week 39) and contains "tm" in the file name. This data will no longer be updated. The second part contains the data from October 4, 2021 (week 40) and is updated every working day. Version 6 update (September 1, 2022): - From September 1, 2022, the second part of the data (from week 40 2021) will no longer be updated every working day, but on Tuesdays and Fridays. The data is retroactively updated on these days for the other days. Version 7 update (January 3, 2023): - As of 1 January 2023, the RIVM will no longer collect additional information. As a result, we will no longer report deaths from January 1, 2023 and the [Deceased] and [Week of Death] columns will no longer be completed.
Date_file: Date and time when the data was published by the RIVM
Date_statistics: Date for statistics; first day of illness, if not known, date of positive lab result, if not known, reporting date to PHS (format: yyyy-mm-dd)
Date_statistics_type: Type of date that was available for date for the "Date for statistics" variable, where: DOO = Date of disease onset : First day of illness as reported by PHS. Please note: it is not always known whether this first day of illness actually concerned Covid-19. DPL = Date of first Positive Lab result : Date of the (first) positive lab result. DON = Date of
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For English, see below This file contains the following numbers: - Number per VOC, VOI and VUM detected per week - Total number of measurements, the denominator, per weekly sample This is split into the WHO (https://www.who .int/en/activities/tracking-SARS-CoV-2-variants/) and/or ECDC (https://www.ecdc.europa.eu/en/covid-19/variants-concern) Variant or Concern ( VOC), Variant of Interest (VOI) and Variant Under Monitoring (VUM). The week to which a sample belongs is based on the date of sampling. The numbers are based on the random sample from the germ surveillance, which means that samples belonging to outbreaks are not included in the data. The file is structured as follows: - One record per VOC, VOI and VUM designated SARS-CoV-2 variant per week. This file is updated weekly on Fridays. The way this information is generated is different from the rapid tests and PCR tests. More advanced machines are used that have a longer lead time than, for example, the machines used for PCR testing. Due to all the logistics processes, it is therefore not feasible to form a representative picture of the last two weeks: these are therefore not reported. Additionally, the germ surveillance project has been operational since October 2020 with an increasing number of weekly samples until mid-early January 2021, therefore older data is not available. For all reported data, the instructions, definitions and footnotes as stated on https://www.rivm.nl/coronavirus-covid-19/virus/varianten are leading. N.B.: Due to internationally changing tribal name definitions based on advancing scientific insight, the records in the data presented here can be adjusted. Changelog: Version 2 update (October 29, 2021): - A WHO_category column has been added with the current variant category (VOC/VOI/VUM) as assigned by WHO. - In addition to the VOC and VOI categories, the VUM category is now also included in the file. Version 3 update (December 10, 2021): - A column May_include_samples_listed_before has been added with a value TRUE it is possible that the reported Variant_cases aggregate samples that are already included in a previous variant in the table. When this is not possible, the value is FALSE. Version 4 update (July 8, 2022): - The May_include_samples_listed_before column has been replaced by an Is_subvariant_of column. If this variant is a subvariant of another variant mentioned, this column contains a value that corresponds to the Variant_code of the other variant. The numbers (Variant_cases) of this subvariant are a subset of those of the other variant. Description of the variables: Version: Version number of the dataset. When the content of the dataset is structurally changed (so not the weekly update or a correction at record level), the version number will be adjusted (+1) and also the corresponding metadata in RIVM data (data.rivm.nl). Date_of_report: Date and time when the data file was last updated by RIVM. Notation: YYYY-MM-DD hh:mm:ss. Date_of_statistics_week_start: The date of the Monday - first day of that week - for which the numbers per week are presented. The last day of the week is Sunday. Notation: YYYY-MM-DD. Variant_code: Scientific name of SARS-CoV-2 variant based on Pangolin nomenclature. Can contain letters, numbers and periods. Variant_name: Current WHO label of SARS-CoV-2 variant. Consists of letters only. ECDC_category: Indicates whether it is a Variant of Concern (VOC), Variant of Interest (VOI), Variant under Monitoring (VUM), or De-escalated Variant (DEV) according to ECDC's current definitions. For more information see also: https://www.ecdc.europa.eu/en/covid-19/variants-concern. WHO_category: Indicates whether it is a Variant of Concern (VOC), Variant of Interest (VOI) or Variant under Monitoring (VUM) according to the current WHO definitions. For more info see also: https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/ Is_subvariant_of: If this variant is a subvariant of another variant mentioned, this column contains a value that corresponds to the Variant_code of the other variant. The numbers (Variant_cases) of this subvariant are a subset of those of the other variant. Sample_size: Shows the total sample size in that week. Consists of whole numbers only. Variant_cases: Shows for how many cases from the sample in the week in question the specific VOC, VOI or VUM was found. Consists of whole numbers only. -------------------------------------------------- --------------------------------------------- Covid-19 reporting of SARS-CoV-2 variants in the Netherlands through the random sample of RT -PCR positive samples in the national surveillance of virus variants. This file contains the following numbers: - Number per VOC, VOI and VUM detected per week - Total number of measurements, the denominator, per weekly sample This is split into the WHO (https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/) and/or ECDC (https://www.ecdc.europa.eu/en/covid-19/variants-concern) designated Variant of Concern (VOC), Variant of Interest (VOI) and Variant Under Monitoring (VUM). The week to which a sample belongs is based on the date of sampling. The numbers are based on the random sample from the virus variant surveillance, which means that samples belonging to outbreaks are not included in the data. The file is structured as follows: - One record per VOC, VOI and VUM noted SARS-CoV-2 variant per week. This file is updated weekly on Fridays. The way this information is generated is different from the rapid tests and PCR tests. More advanced machines are used that have a longer run time than, for example, the machines used for PCR testing. Due to all the logistics processes, it is therefore not feasible to form a representative picture of the most recent two weeks: these are not reported for that reason. Additionally, the virus variant surveillance project has been operational since October 2020 with an increasing number of weekly samples until mid-early January 2021, therefore older data is not available. For all reported data, the instructions, definitions and footnotes as stated on https://www.rivm.nl/coronavirus-covid-19/virus/varianten are leading. Please note, due to internationally changing variant name definitions based on advancing scientific insight, the records in the data presented here can be adjusted. Changelog: Version 2 update (October 29, 2021): - A WHO_category column has been added with the current variant category (VOC/VOI/VUM) as assigned by the WHO. - In addition to the VOC and VOI categories, the VUM category is now also included in the file. Version 3 update (December 10, 2021): - A column May_include_samples_listed_before has been added with a value TRUE whenever it is possible for the reported Variant_cases to aggregate samples that have already been included in a previous variant in the table. When this is not possible, the value is FALSE. Version 4 update (July 8, 2022): - The May_include_samples_listed_before column has been replaced by an Is_subvariant_of column. If this variant is a subvariant of another variant mentioned, this column contains a value that corresponds to the Variant_code of the other variant. The numbers (Variant_cases) of this subvariant are a subset of those of the other variant. Description of the variables: Version: Version number of the dataset. When the content of the dataset is structurally changed (so not the weekly update or a correction at record level), the version number will be adjusted (+1) and also the corresponding metadata in RIVM data (data.rivm.nl). Date_of_report: Date and time when the database was last updated by the RIVM. Notation: YYYY-MM-DD hh:mm:ss. Date_of_statistics_week_start: The date of the Monday - first day of that week - for which the numbers per week are presented. The last day of the week is Sunday. Notation: YYYY-MM-DD. Variant_code: Scientific name of SARS-CoV-2 variant based on Pangolin nomenclature. Can contain letters, numbers and periods. Variant_name: Current WHO label of SARS-CoV-2 variant. Consists of letters only. ECDC_category: Indicates whether it is a Variant of Concern (VOC), Variant of Interest (VOI), Variant under Monitoring (VUM), or De-escalated Variant (DEV) according to ECDC's current definitions. For more information see also: https://www.ecdc.europa.eu/en/covid-19/variants-concern. WHO_category: Indicates whether it is a Variant of Concern (VOC), Variant of Interest (VOI) or Variant under Monitoring (VUM) according to the current WHO definitions. For more information see also: https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/ Is_subvariant_of: If this variant is a subvariant of another variant that has been mentioned, this column contains a value that corresponds to the Variant_code of the other variant. The numbers (Variant_cases) of this subvariant are a subset of those of the other variant. Sample_size: Shows the total sample size in that week. Consists of whole numbers only. Variant_cases: Shows for how many cases from the sample from that week the specific VOC, VOI or VUM was found. Consists of whole numbers only.
Daily count of NYC residents who tested positive for SARS-CoV-2, who were hospitalized with COVID-19, and deaths among COVID-19 patients.
Note that this dataset currently pulls from https://raw.githubusercontent.com/nychealth/coronavirus-data/master/case-hosp-death.csv on a daily basis.
https://www.iddo.org/tools-resources/data-use-agreementhttps://www.iddo.org/tools-resources/data-use-agreement
Clinical data from patients hospitalised with COVID19 in the United States of America, shared as a part of the ISARIC Clinical Characterisation Group collaboration.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets in this publication report the number of diagnoses with coronavirus disease (COVID-19) as reported by the Department of Health in Ireland. This includes new cases diagnosed per day and cumulative cases, hospitalisations, ICU admissions, deaths, number of healthcare workers, number of clusters, gender of cases, age groups of cases, mode of transmission, age groups of those hospitalised, and cases per county. To aid standardisation of age groups and cases per county, the population estimates by age group for 2019 and the actual county population in the 2016 Census from Ireland's Central Statistics Office are also included as separate datasets, to allow expression of cases per million population.
These are
age_population_cso_2019.csv has been updated to include separate population estimates for those aged 65-74 years, 75-84 years, and 85 years and over. This is in response to the HSPC releasing case and hospitalisation data for these groups rather than a combined 65 years and over group.
counties_population_cso_2016.csv has been updated to remove trailing spaces in the 'county' column.
doh_covid_ie_cases_analysis.csv is regularly updated at https://github.com/frankmoriarty/covid_ie/blob/master/doh_covid_ie_cases_analysis.csv
This dataset has been temporarily removed as of April 3, 2023. This dataset is still being maintained internally and will be restored after internal processes undergo a transition. Similar data are available from CDPH: https://calcat.covid19.ca.gov/cacovidmodels/.
This dataset displays SARS-CoV-2 lineages identified through whole genome sequencing (WGS) in Marin County by date the sample was collected. There is a minimum 7-day (and up to 21-day) lag in reporting. Not all positive samples in Marin County are sequenced, thus these data may not fully represent the variants circulating in the community. Summarized data can be found in the "Prevalence of Variants of SARS-CoV-2 in Marin County" chart at https://coronavirus.marinhhs.org/surveillance.
More information about variants of SARS-CoV-2 can be found via the CDC at https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html
The ongoing SARS-CoV-2 pandemic has devastated the global economy and claimed more than one million lives, presenting an urgent global health crisis. To identify host factors required for infection by SARS-CoV-2 and seasonal coronaviruses, we designed a focused high-coverage CRISPR-Cas9 library targeting 332 members of a recently published SARS-CoV-2 protein interactome. We leveraged the compact nature of this library to systematically screen SARS-CoV-2 at two physiologically relevant temperatures (33 ºC and 37 ºC) along with three related coronaviruses (HCoV-229E, HCoV-NL63, and HCoV-OC43), allowing us to probe this interactome at a much higher resolution relative to genome scale studies. This approach yielded several new insights, including unexpected virus-specific differences in Rab GTPase requirements and GPI anchor biosynthesis, as well as identification of multiple pan-coronavirus factors involved in cholesterol homeostasis. This coronavirus essentiality catalog could inform ongoing drug development efforts aimed at intercepting and treating COVID-19, and help prepare for future coronavirus outbreaks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Covid-19 pandemic has been one of the most disruptive and painful phenomena of the last few decades. As of July 2021, the origins of the SARS-CoV-2 virus that caused the outbreak remain a mystery. This work analyzes the prevalence in news media articles of two popular hypotheses about SARS-CoV-2 virus origins: the natural emergence and the lab-leak hypotheses.
This data set contains frequency counts of target words in news and opinion articles from 12 popular news media outlets. The target words are listed in the associated manuscript and are mostly words associated with the Covid-19 pandemic.
The list of compressed files in this data set is listed next:
targetWordsInArticlesCounts.rar contains counts of target words in outlets articles as well as total counts of words in articles
targetWordsFrequencies.rar daily, weekly, monthly word frequencies
wordEmbeddingModels.rar monthly embedding models of news outlets content
analysisScripts.rar analysis notebooks
The textual content of news and opinion articles from the outlets is available in the outlet's online domains and/or public cache repositories such as Google cache, The Internet Wayback Machine, and Common Crawl. We used derived word frequency counts from these sources. Textual content included in our analysis is circumscribed to articles headlines and main body of text of the articles and does not include other article elements such as figure captions.
Targeted textual content was located in HTML raw data using outlet specific XPath expressions. Tokens were lowercased prior to estimating frequency counts.
Yearly frequency usage of a target word in an outlet in any given temporal interval ( daily, weekly, monthly) was estimated by dividing the total number of occurrences of the target word in all articles of a given temporal interval by the number of all words in all articles of that temporal interval. This method of estimating frequency accounts for variable volume of total article output over time.
In a small percentage of articles, outlet specific XPath expressions might fail to properly capture the content of the article due to the heterogeneity of HTML elements and CSS styling combinations with which articles text content is arranged in outlets online domains. As a result, the total and target word counts metrics for a small subset of articles are not precise. In a random sample of articles and outlets, manual estimation of target words counts overlapped with the automatically derived counts for over 90% of the articles. Most of the incorrect frequency counts are minor deviations from the actual counts such as for instance counting a word in an article footnote encouraging article readers to find related articles and that the XPath expression might mistakenly include as the content of the article main text. Some additional outlet-specific inaccuracies that we could identify occurred in the WSJ where in less than 5% of the articles XPath expressions failed to capture the article's main text content. Other outlets articles samples sizes might not be comprehensive but, to the best of our knowledge, they are representative and include tens of thousands of articles per outlet/year. To conclude, in a data analysis of over 1.5 million articles, we cannot manually check the correctness of frequency counts for every single article and hundred percent accuracy at capturing articles’ content is elusive due to the small number of difficult to detect boundary cases such as incorrect HTML markup syntax in online domains. Overall however, we are confident that our frequency metrics are representative of word prevalence in print news media content (see Figure 1 of main manuscript for supporting evidence).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The full text of this article can be freely accessed on the publisher's website.
This repository within the ACTIV TRACE initiative houses a comprehensive collection of datasets related to SARS-CoV-2. The processing of SARS-CoV-2 Sequence Read Archive (SRA) files has been optimized to identify genetic variations in viral samples. This information is then presented in the Variant Call Format (VCF). Each VCF file corresponds to the SRA parent-run's accession ID. Additionally, the data is available in the parquet format, making it easier to search and filter using the Amazon Athena Service. The SARS-CoV-2 Variant Calling Pipeline is designed to handle new data every six hours, with updates to the AWS ODP bucket occurring daily.