6 datasets found
  1. COVID-19 Variant Data (ARCHIVED)

    • data.ca.gov
    • healthdata.gov
    • +4more
    csv, xlsx, zip
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). COVID-19 Variant Data (ARCHIVED) [Dataset]. https://data.ca.gov/dataset/covid-19-variant-data-archived
    Explore at:
    xlsx, csv, zipAvailable download formats
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: This dataset is no longer being updated due to the end of the COVID-19 Public Health Emergency.

    The California Department of Public Health (CDPH) is identifying ​the prevalence of circulating SARS-CoV-2 variants by analyzing ​CDPH Genomic Surveillance Data and ​CalREDIE, CDPH's communicable disease reporting and surveillance system. Viruses mutate into new strains or variants over time. Some variants emerge and then disappear. Other variants become common and circulate for a long time. Several specialized laboratories statewide sequence the genomes of a fraction of all positive COVID-19 tests to determine which variants are circulating. Sequencing and reporting of variant results takes several days after a test is identified as a positive for COVID-19. Not all ​viruses from positive COVID-19 tests are ​sequenced. Knowing what variants are circulating in California informs public health and clinical action.

    Note: There is a natural reporting lag in these data due to the time commitment to complete whole genome sequencing; therefore, a 14 day lag is applied to these datasets to allow for data completeness. Please note that more recent data should be used with caution.

    For more information, please see: https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/COVID-Variants.aspx

  2. COVID-19 Variant Data

    • kaggle.com
    zip
    Updated Mar 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidhi Sharma (2023). COVID-19 Variant Data [Dataset]. https://www.kaggle.com/datasets/nidzsharma/covid-19-variant-data
    Explore at:
    zip(68659 bytes)Available download formats
    Dataset updated
    Mar 12, 2023
    Authors
    Nidhi Sharma
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The California Department of Public Health (CDPH) is identifying ​the prevalence of circulating SARS-CoV-2 variants by analysing ​CDPH Genomic Surveillance Data and ​CalREDIE, CDPH's communicable disease reporting and surveillance system. Viruses mutate into new strains or variants over time. Some variants emerge and then disappear. Other variants become common and circulate for a long time. Several specialized laboratories state-wide sequence the genomes of a fraction of all positive COVID-19 tests to determine which variants are circulating. Sequencing and reporting of variant results takes several days after a test is identified as a positive for COVID-19. Not all ​viruses from positive COVID-19 tests are ​sequenced. Knowing what variants are circulating in California informs public health and clinical action.

  3. CDPH-CalCAT Modeling Nowcasts and Forecasts for COVID-19 and Influenza

    • data.chhs.ca.gov
    • data.ca.gov
    • +2more
    csv, parquet, zip
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). CDPH-CalCAT Modeling Nowcasts and Forecasts for COVID-19 and Influenza [Dataset]. https://data.chhs.ca.gov/dataset/calcat
    Explore at:
    zip, csv(14373623), csv(679), csv(702), csv(2051609), csv(7696178), csv(649), parquet(183476911), zip(14740419)Available download formats
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    This dataset includes three tables with the model-based projections and estimates as shown on CalCAT in 2025 (http://calcat.cdph.ca.gov) for California state, regions, and counties.

    (1) COVID-19 Nowcasts includes the R-effective estimates for COVID-19 from the different models available for the past 80 days from the archive date and the median ensemble thereof.

    (2) CalCAT Forecasts includes hospital census and admissions forecasts for COVID-19 and Influenza, and the corresponding ensemble metrics for a 4 week horizon from the archive date.

    (3) Variant Proportion Nowcasts contains the Integrated Genomic Epidemiology Dataset (IGED)-based and Terra-based estimates of COVID-19 variants circulating over the past 3 months as well as model-based predictions for the proportions of the variants of concern for dates leading up to the archive date. Prediction intervals are included when available.

    This dataset provides CalCAT users with programmatic access to the downloadable datasets on CalCAT.

    This dataset also includes a zipped file with the historical archives of the COVID-19 Nowcasts, CalCAT Forecasts and Variant Proportion Nowcasts through 2023.

  4. I

    SARS-CoV-2 Delta variant genomic variation associated with breakthrough...

    • immport.org
    • data.niaid.nih.gov
    • +1more
    url
    Updated May 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). SARS-CoV-2 Delta variant genomic variation associated with breakthrough infection in Northern California: A retrospective cohort study [Dataset]. http://doi.org/10.21430/M3CTYNVL3V
    Explore at:
    urlAvailable download formats
    Dataset updated
    May 17, 2023
    License

    https://www.immport.org/agreementhttps://www.immport.org/agreement

    Description

    To characterize the genomic variation within a circulating variant and identifying potential mutations associated with breakthrough infection among persons with Delta variant SARS-CoV-2 infection

  5. I

    Data from: Risk of severe clinical outcomes among persons with SARS-CoV-2...

    • immport.org
    • data.niaid.nih.gov
    • +1more
    url
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Risk of severe clinical outcomes among persons with SARS-CoV-2 infection with differing levels of vaccination during widespread Omicron (B.1.1.529) and Delta (B.1.617.2) variant circulation in Northern California: A retrospective cohort study [Dataset]. http://doi.org/10.21430/M3FXU7B7MZ
    Explore at:
    urlAvailable download formats
    License

    https://www.immport.org/agreementhttps://www.immport.org/agreement

    Description

    To identify risk factors for severe clinical outcomes among persons with SARS-CoV-2 infection and persons with varying vaccination status for COVID-19 during periods of Omicron versus Delta variant circulation

  6. n

    Understanding shared variation in SARS-CoV-2 genomes

    • data.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Aug 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacia Wyman (2022). Understanding shared variation in SARS-CoV-2 genomes [Dataset]. http://doi.org/10.6078/D1JQ5C
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 28, 2022
    Dataset provided by
    University of California, Berkeley
    Authors
    Stacia Wyman
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The project is a collaborative effort of investigators from the University of California, Berkeley’s Innovative Genomics Institute (IGI) and School of Public Health (SPH); Kaiser Permanente Northern California (KPNC); and the California Department of Public Health (CDPH), with administrative and programmatic support provided by Heluna Health. Over the project period, the collaborating investigators will analyze approximately 35,000 genomes of SARS-CoV-2 specimens obtained from KPNC members and sequenced by the CDPH through its COVIDNet activities. By combining results from the genomic analysis of low-frequency alleles with clinical and epidemiologic data available in patient records, including demographic variables, COVID-19 vaccination status (dates of vaccination; number of doses; manufacturer), COVID-19 disease severity, and underlying medical conditions, we assessed which shared genomic variations are associated with a greater risk of symptomatic infection and severe clinical outcomes; COVID-19 vaccine effectiveness; and transmission of SARS-CoV-2 in the household. The project and its results can serve as a model for community-based monitoring of the evolution and spread of SARS-CoV-2 and use of the data to inform decisions about the formulation and use of COVID-19 vaccines, including booster doses and next-generation vaccines. Methods Sample collection Our samples are from Kaiser Northern California patients testing positive for SARS-CoV-2 starting June 1, 2021, and through the present. The RNA is sent to the California Department of Public Health (CDPH) lab to be sequenced by COVIDNet–a consortium of primarily UC system labs helping CDPH with the overflow and backlog of samples. Once the genomes have been sequenced, the lineage information and unique deidentified PAUI number are returned to Kaiser where this information is recorded. Metadata from this list of PAUI’s is sent weekly to UC Berkeley. The KPNC sequencing data is returned to us through a third party that is processing all CDPH genomes and stored on a server at UC Berkeley and matched with metadata using PAUI’s. Sequence analysis The raw sequencing data is processed through a SARS-CoV-2 analysis pipeline that has been modified for this work as follows. Adapter removal and trimming are performed using bbduk. The reads are then aligned to the Wuhan reference genome using minimap2 followed by primer trimming using iVAR . We next create a pileup file using samtools and use that input to create a consensus file. This consensus file is created with iVAR using a minimum depth of 10 reads and majority rule for base calling. We next use iVAR to call variants from the pileup file where we set the threshold for calling a mutation to be 0.01. This will call mutations for any loci where at least one percent of the reads are non-reference. This very low threshold allows us to capture all variation that is seen in the sequencing data. The list of variants is then annotated with the gene and amino acid change (if there is one), and whether the mutation is considered defining in any SARS-CoV-2 variants and whether that mutation is seen in only one variant. This dataset includes the fasta consensus sequences and mutation calls for each genome.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
California Department of Public Health (2025). COVID-19 Variant Data (ARCHIVED) [Dataset]. https://data.ca.gov/dataset/covid-19-variant-data-archived
Organization logo

COVID-19 Variant Data (ARCHIVED)

Explore at:
xlsx, csv, zipAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Note: This dataset is no longer being updated due to the end of the COVID-19 Public Health Emergency.

The California Department of Public Health (CDPH) is identifying ​the prevalence of circulating SARS-CoV-2 variants by analyzing ​CDPH Genomic Surveillance Data and ​CalREDIE, CDPH's communicable disease reporting and surveillance system. Viruses mutate into new strains or variants over time. Some variants emerge and then disappear. Other variants become common and circulate for a long time. Several specialized laboratories statewide sequence the genomes of a fraction of all positive COVID-19 tests to determine which variants are circulating. Sequencing and reporting of variant results takes several days after a test is identified as a positive for COVID-19. Not all ​viruses from positive COVID-19 tests are ​sequenced. Knowing what variants are circulating in California informs public health and clinical action.

Note: There is a natural reporting lag in these data due to the time commitment to complete whole genome sequencing; therefore, a 14 day lag is applied to these datasets to allow for data completeness. Please note that more recent data should be used with caution.

For more information, please see: https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/COVID-Variants.aspx

Search
Clear search
Close search
Google apps
Main menu