Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Effective September 27, 2023, this dataset will no longer be updated. Similar data are accessible from wonder.cdc.gov.
Deaths involving COVID-19, pneumonia, and influenza reported to NCHS by sex, age group, and jurisdiction of occurrence.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Influenza A viruses (IAV) circulate endemically among many wild aquatic bird populations that seasonally migrate between wintering grounds in southern latitudes to breeding ranges along the perimeter of the circumpolar arctic. Arctic and subarctic zones are hypothesized to serve as ecologic drivers of the intercontinental movement and reassortment of IAVs due to high densities of disparate populations of long distance migratory and native bird species present during breeding seasons. Iceland is a staging ground that connects the East Atlantic and North Atlantic American flyways, providing a unique study system for characterizing viral flow between eastern and western hemispheres. Using Bayesian phylodynamic analyses, we sought to evaluate the viral connectivity of Iceland to proximal regions and how inter-species transmission and reassortment dynamics in this region influence the geographic spread of low and highly pathogenic IAVs. Findings demonstrate that IAV movement in the arctic and subarctic follows seabird migration around the perimeter of the circumpolar north, favoring short-distance flights between proximal regions rather than long distance flights over the polar interior. Iceland connects virus movement between mainland Europe and North America, particularly due to the westward migration of wild birds from mainland Europe to Northeastern Canada and Greenland. Though virus diffusion rates were similar among avian taxonomic groups in Iceland, gulls act as recipients and not sources of IAVs to other avian hosts prior to onward migration. These data identify patterns of virus movement in northern latitudes and inform future surveillance strategies related to seasonal and emergent IAVs with pandemic potential. Methods Field sample collection From May 2010 through February 2018, we obtained IAV isolates from various species of seabirds, shorebirds, and waterfowl as well as environmental sampling of avian fecal material from locations throughout Iceland (capture and swab data can be found here: https://doi.org/10.5066/XX (Dusek et al. 202X)). Live sampled birds were captured using a 18m x 12m cannon-propelled capture net, noose pole, or hand capture. Birds found dead or moribund were also sampled. Hunter-harvested waterfowl and fisheries-bycatch seabirds were sampled as available. All birds were identified to species and, for live birds, individually marked with metal bands. Age characteristics were determined and age was documented for each bird according to the following schemes adapted from U.S. Geological Survey year classification codes: hatched in same calendar year as sampling (1CY), hatched previous calendar year (2CY), hatched previous calendar year or older, exact age unknown (2CY+), hatched three calendar years prior to sampling (3CY), hatched four calendar years prior to sampling (4CY), hatched more than four calendar years prior to sampling (4CY+), or unknown if age could not be determined (U) (Olsen KM, 2004; Prater, Marchant, & Vuorinen, 1977; USGS, 2020). Due to species specific differences, not all aging categories could be applied to all species sampled. All live birds were immediately released following completion of sampling. To sample for IAV, a single polyester-tipped swab was used to swab the cloaca only (2010-2013) or to first swab the oral cavity then the cloaca (2014-2017). Opportunistic environmental sampling of fecal material was also conducted using a direct swabbing method (2018). Each swab sample was immediately placed in individual cryovials containing 1.25 ml viral transport media (Docherty & Slota, 1988). Vials were held on ice for up to 5 hours prior to being stored in liquid nitrogen or liquid nitrogen vapor. Samples were shipped on dry ice from Iceland to Madison, Wisconsin, USA by private courier with dry ice replenishment during shipping. Once received in the laboratory, samples were stored at -80o C until analysis. Virus extraction, RT-PCR, virus isolation Viral RNA was extracted from swab samples using the MagMAXTM-96 AI/ND Viral RNA Isolation Kit (Ambion, Austin, TX) following the manufacturer’s procedures. Real-time RT-PCR was performed using previously published procedures, primers, and probes (Spackman et al., 2002) designed to detect the IAV matrix gene. RT-PCR assays utilized reagents provided in the Qiagen OneStep® RT-PCR kit. Virus isolation was performed in embryonating chicken egg culture on all swab samples exhibiting positive Ct values from RT-PCR analysis (Woolcock, 2008), with a primary cut off value of 45 on primary screen and 22 on secondary screen. All virus isolates were screened for the presence of H5 and H7 IAV subtypes using primers and probes specific for those subtypes (Spackman et al., 2002). Egg-grown virus isolates were sequenced using multiple standard methods including Sanger, Roche 454, and Illumina (HiSeq 2000 and MiSeq) sequencing (Dusek et al., 2014; Guan et al., 2019; Hall et al., 2014). Datasets for phylodynamic analyses Global dataset: The PB2 segment was selected as the basis for phylodynamic analysis. Advantages of focusing on PB2 - the largest internal segment of the IAV genome - include maximizing the number of nucleotides in the analysis (>2000 nts) and investigating transmission dynamics without targeting a specific subtype. All available avian and marine mammal IAV PB2 genes sequenced between 2009 and 2019 globally were downloaded from the National Center for Biotechnology Information Influenza Virus Resource database (NCBI IVR) (http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html) on February 12, 2020, resulting in 13,469 sequences. Duplicate sequences (based on collection date, location, and nucleotide content) and sequences with less than 75% unambiguous bases were removed, and all vaccine derivative and laboratory-synthesized recombinant sequences were excluded. Sequences in the dataset were only included if isolation dates, location, and host species were available, resulting in 7,245 remaining sequences. Downsampling of taxa: The downsampling strategy aimed to reduce the number of sequence taxa for computation and mitigate sampling bias while maintaining the genetic diversity in the dataset. Four variables were considered important for explaining genetic diversity in the IAV sequence dataset: geographic region, host taxa, sampling year, and hemagglutinin (HA) subtype. Geographic regions included North America, Europe, Iceland, Asia, Africa, and South America (Australia and Antarctica were removed due to insufficient sequence counts). HA subtypes included H1, H2, H3, H4, H6, H8, H10, H11, H12, H13, H14, H15, H16, and pooled H5/7/9. H5, H7, and H9 were combined, as these were over-represented in the global dataset. Host categories included Anseriformes, Charadriiformes, Galliformes, and Other, which comprised all other avian taxa and marine mammals. To inform the downsampling strategy and evaluate if any of the four variables were correlated, a multiple correspondence analysis (MCA) was performed (JMP Pro v.14.0.0 (JMP Version 14.0.0, 1989-2019)). The MCA uses categorical data as input, which for this study included the sampling metadata associated with each sequence (region, host taxa, year, and HA subtype). Through representation of the variables in two-dimensional Euclidean space, significant clustering of HA subtypes with host taxa was detected (Supplementary Fig. 1), indicative of host-specific subtypes that are a well-known feature of influenza. These findings confirmed by previously published data on species-specificity of HA subtypes (Byrd-Leotis, Cummings, & Steinhauer, 2017; Long, Mistry, Haslam, & Barclay, 2019; Verhagen et al., 2015) led us to downsample the dataset stratifying taxa by two non-overlapping variables: geographic region and HA subtype. Data were downsampled to maintain 21-75 taxa per geographic region category and 6-30 per HA subtype category, resulting in a total of 301 sequences (outgroup). This step was performed to mitigate sampling bias resulting in over-representation of species or viral strains, while accounting for genetic diversity in the dataset. Next, to ensure relative evenness of geographic state groupings for discrete trait analyses, virus sequences from Iceland (n=93) were downsampled by stratifying taxa by HA subtype and maintaining 1-15 sequences per category, resulting in 63 sequences (ingroup). These 63 sequences were used for global and local discrete trait analyses and reflected the composition of diverse subtypes by host for the full Iceland sequence dataset. The resulting dataset reflected the underlying composition of host-specific subtypes present in this localized system. To assist with rooting and time-calibration of the tree, historical avian sequences from NCBI IVR were downloaded for the years 1979-2008. These were downsampled by year to ensure one sequence per year, resulting in 30 historic sequences. The total downsampled dataset, including the outgroup (n=301), ingroup (n=63), and historic sequences (n=30) resulted in a total of 394 sequences. Europe-Iceland-North America Datasets: To elucidate viral dynamics between significant source regions and Iceland and within-Iceland phylodynamics, a second analysis was performed at a restricted scale to Europe, Iceland, and North America. The cleaned global dataset described above (n=7245) was downsampled to include significant source regions of North America (n=3222) and Europe (n=407), totaling 3629 sequences. To identify at lower spatial resolution the source/sink locations relevant to Iceland, a K-means cluster analysis was performed (JMP Pro v.14.0.0 (JMP Version 14.0.0, 1989-2019)) using latitude/longitude coordinates for each of the 3629 sequences (obtained by extracting sampling location from the strain name of each sequence and searching in www.geonames.org). A total of 20 intraregional clusters resulted in highest support. Identified clusters with <50 sequences were combined with geographically proximal
This data release contains model outputs depicting the probability of an H5 or H7 avian influenza outbreak at any given point in the continental United States for each week of the year.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: