6 datasets found

COVID-19 Variant Data (ARCHIVED)
data.ca.gov
healthdata.gov
+4more
csv, xlsx, zip
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). COVID-19 Variant Data (ARCHIVED) [Dataset]. https://data.ca.gov/dataset/covid-19-variant-data-archived
Explore at:
xlsx, csv, zipAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Note: This dataset is no longer being updated due to the end of the COVID-19 Public Health Emergency.

The California Department of Public Health (CDPH) is identifying the prevalence of circulating SARS-CoV-2 variants by analyzing CDPH Genomic Surveillance Data and CalREDIE, CDPH's communicable disease reporting and surveillance system. Viruses mutate into new strains or variants over time. Some variants emerge and then disappear. Other variants become common and circulate for a long time. Several specialized laboratories statewide sequence the genomes of a fraction of all positive COVID-19 tests to determine which variants are circulating. Sequencing and reporting of variant results takes several days after a test is identified as a positive for COVID-19. Not all viruses from positive COVID-19 tests are sequenced. Knowing what variants are circulating in California informs public health and clinical action.

Note: There is a natural reporting lag in these data due to the time commitment to complete whole genome sequencing; therefore, a 14 day lag is applied to these datasets to allow for data completeness. Please note that more recent data should be used with caution.

For more information, please see: https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/COVID-Variants.aspx
COVID-19 Variant Data
kaggle.com
zip
Updated Mar 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nidhi Sharma (2023). COVID-19 Variant Data [Dataset]. https://www.kaggle.com/datasets/nidzsharma/covid-19-variant-data
Explore at:
zip(68659 bytes)Available download formats
Dataset updated
Mar 12, 2023
Authors
Nidhi Sharma
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The California Department of Public Health (CDPH) is identifying the prevalence of circulating SARS-CoV-2 variants by analysing CDPH Genomic Surveillance Data and CalREDIE, CDPH's communicable disease reporting and surveillance system. Viruses mutate into new strains or variants over time. Some variants emerge and then disappear. Other variants become common and circulate for a long time. Several specialized laboratories state-wide sequence the genomes of a fraction of all positive COVID-19 tests to determine which variants are circulating. Sequencing and reporting of variant results takes several days after a test is identified as a positive for COVID-19. Not all viruses from positive COVID-19 tests are sequenced. Knowing what variants are circulating in California informs public health and clinical action.
CDPH-CalCAT Modeling Nowcasts and Forecasts for COVID-19 and Influenza
data.chhs.ca.gov
data.ca.gov
+2more
csv, parquet, zip
Updated Nov 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). CDPH-CalCAT Modeling Nowcasts and Forecasts for COVID-19 and Influenza [Dataset]. https://data.chhs.ca.gov/dataset/calcat
Explore at:
zip, csv(14373623), csv(679), csv(702), csv(2051609), csv(7696178), csv(649), parquet(183476911), zip(14740419)Available download formats
Dataset updated
Nov 28, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset includes three tables with the model-based projections and estimates as shown on CalCAT in 2025 (http://calcat.cdph.ca.gov) for California state, regions, and counties.

(1) COVID-19 Nowcasts includes the R-effective estimates for COVID-19 from the different models available for the past 80 days from the archive date and the median ensemble thereof.

(2) CalCAT Forecasts includes hospital census and admissions forecasts for COVID-19 and Influenza, and the corresponding ensemble metrics for a 4 week horizon from the archive date.

(3) Variant Proportion Nowcasts contains the Integrated Genomic Epidemiology Dataset (IGED)-based and Terra-based estimates of COVID-19 variants circulating over the past 3 months as well as model-based predictions for the proportions of the variants of concern for dates leading up to the archive date. Prediction intervals are included when available.

This dataset provides CalCAT users with programmatic access to the downloadable datasets on CalCAT.

This dataset also includes a zipped file with the historical archives of the COVID-19 Nowcasts, CalCAT Forecasts and Variant Proportion Nowcasts through 2023.
I
SARS-CoV-2 Delta variant genomic variation associated with breakthrough...
immport.org
data.niaid.nih.gov
+1more
url
Updated May 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). SARS-CoV-2 Delta variant genomic variation associated with breakthrough infection in Northern California: A retrospective cohort study [Dataset]. http://doi.org/10.21430/M3CTYNVL3V
Explore at:
urlAvailable download formats
Unique identifier
https://doi.org/10.21430/M3CTYNVL3V
Dataset updated
May 17, 2023
License
https://www.immport.org/agreementhttps://www.immport.org/agreement
Description
To characterize the genomic variation within a circulating variant and identifying potential mutations associated with breakthrough infection among persons with Delta variant SARS-CoV-2 infection
I
Data from: Risk of severe clinical outcomes among persons with SARS-CoV-2...
immport.org
data.niaid.nih.gov
+1more
url
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Risk of severe clinical outcomes among persons with SARS-CoV-2 infection with differing levels of vaccination during widespread Omicron (B.1.1.529) and Delta (B.1.617.2) variant circulation in Northern California: A retrospective cohort study [Dataset]. http://doi.org/10.21430/M3FXU7B7MZ
Explore at:
urlAvailable download formats
Unique identifier
https://doi.org/10.21430/M3FXU7B7MZ
License
https://www.immport.org/agreementhttps://www.immport.org/agreement
Description
To identify risk factors for severe clinical outcomes among persons with SARS-CoV-2 infection and persons with varying vaccination status for COVID-19 during periods of Omicron versus Delta variant circulation
n
Understanding shared variation in SARS-CoV-2 genomes
data.niaid.nih.gov
datasetcatalog.nlm.nih.gov
+1more
zip
Updated Aug 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacia Wyman (2022). Understanding shared variation in SARS-CoV-2 genomes [Dataset]. http://doi.org/10.6078/D1JQ5C
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6078/D1JQ5C
Dataset updated
Aug 28, 2022
Dataset provided by
University of California, Berkeley
Authors
Stacia Wyman
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The project is a collaborative effort of investigators from the University of California, Berkeley’s Innovative Genomics Institute (IGI) and School of Public Health (SPH); Kaiser Permanente Northern California (KPNC); and the California Department of Public Health (CDPH), with administrative and programmatic support provided by Heluna Health. Over the project period, the collaborating investigators will analyze approximately 35,000 genomes of SARS-CoV-2 specimens obtained from KPNC members and sequenced by the CDPH through its COVIDNet activities. By combining results from the genomic analysis of low-frequency alleles with clinical and epidemiologic data available in patient records, including demographic variables, COVID-19 vaccination status (dates of vaccination; number of doses; manufacturer), COVID-19 disease severity, and underlying medical conditions, we assessed which shared genomic variations are associated with a greater risk of symptomatic infection and severe clinical outcomes; COVID-19 vaccine effectiveness; and transmission of SARS-CoV-2 in the household. The project and its results can serve as a model for community-based monitoring of the evolution and spread of SARS-CoV-2 and use of the data to inform decisions about the formulation and use of COVID-19 vaccines, including booster doses and next-generation vaccines. Methods Sample collection Our samples are from Kaiser Northern California patients testing positive for SARS-CoV-2 starting June 1, 2021, and through the present. The RNA is sent to the California Department of Public Health (CDPH) lab to be sequenced by COVIDNet–a consortium of primarily UC system labs helping CDPH with the overflow and backlog of samples. Once the genomes have been sequenced, the lineage information and unique deidentified PAUI number are returned to Kaiser where this information is recorded. Metadata from this list of PAUI’s is sent weekly to UC Berkeley. The KPNC sequencing data is returned to us through a third party that is processing all CDPH genomes and stored on a server at UC Berkeley and matched with metadata using PAUI’s. Sequence analysis The raw sequencing data is processed through a SARS-CoV-2 analysis pipeline that has been modified for this work as follows. Adapter removal and trimming are performed using bbduk. The reads are then aligned to the Wuhan reference genome using minimap2 followed by primer trimming using iVAR . We next create a pileup file using samtools and use that input to create a consensus file. This consensus file is created with iVAR using a minimum depth of 10 reads and majority rule for base calling. We next use iVAR to call variants from the pileup file where we set the threshold for calling a mutation to be 0.01. This will call mutations for any loci where at least one percent of the reads are non-reference. This very low threshold allows us to capture all variation that is seen in the sequencing data. The list of variants is then annotated with the gene and amino acid change (if there is one), and whether the mutation is considered defining in any SARS-CoV-2 variants and whether that mutation is seen in only one variant. This dataset includes the fasta consensus sequences and mutation calls for each genome.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

California Department of Public Health (2025). COVID-19 Variant Data (ARCHIVED) [Dataset]. https://data.ca.gov/dataset/covid-19-variant-data-archived

COVID-19 Variant Data (ARCHIVED)

Explore at:

xlsx, csv, zipAvailable download formats

Dataset updated

Nov 7, 2025

Dataset authored and provided by

California Department of Public Healthhttps://www.cdph.ca.gov/

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Note: This dataset is no longer being updated due to the end of the COVID-19 Public Health Emergency.

The California Department of Public Health (CDPH) is identifying the prevalence of circulating SARS-CoV-2 variants by analyzing CDPH Genomic Surveillance Data and CalREDIE, CDPH's communicable disease reporting and surveillance system. Viruses mutate into new strains or variants over time. Some variants emerge and then disappear. Other variants become common and circulate for a long time. Several specialized laboratories statewide sequence the genomes of a fraction of all positive COVID-19 tests to determine which variants are circulating. Sequencing and reporting of variant results takes several days after a test is identified as a positive for COVID-19. Not all viruses from positive COVID-19 tests are sequenced. Knowing what variants are circulating in California informs public health and clinical action.

Note: There is a natural reporting lag in these data due to the time commitment to complete whole genome sequencing; therefore, a 14 day lag is applied to these datasets to allow for data completeness. Please note that more recent data should be used with caution.

For more information, please see: https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/COVID-Variants.aspx

Clear search

Close search

Google apps

Main menu

COVID-19 Variant Data (ARCHIVED)

COVID-19 Variant Data

CDPH-CalCAT Modeling Nowcasts and Forecasts for COVID-19 and Influenza

SARS-CoV-2 Delta variant genomic variation associated with breakthrough...

Data from: Risk of severe clinical outcomes among persons with SARS-CoV-2...

Understanding shared variation in SARS-CoV-2 genomes

COVID-19 Variant Data (ARCHIVED)