Home Action Genome is a large-scale multi-view video database of indoor daily activities. Every activity is captured by synchronized multi-view cameras, including an egocentric view. There are 30 hours of vides with 70 classes of daily activities and 453 classes of atomic actions.
Action Genome Question Answering (AGQA) is a benchmark for compositional spatio-temporal reasoning. AGQA contains 192M unbalanced question answer pairs for 9.6K videos. It also contains a balanced subset of 3.9M question answer pairs, 3 orders of magnitude larger than existing benchmarks, that minimizes bias by balancing the answer distributions and types of question structures.
AGQA introduces multiple training/test splits to test for various reasoning abilities, including generalization to novel compositions, to indirect references, and to more compositional steps.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Table 3A-F from Genome-Wide siRNA Screen for Modulators of Cell Death Induced by Proteasome Inhibitor Bortezomib
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Efficient high-throughput transcriptomics (HTT) tools promise inexpensive, rapid assessment of possible biological consequences of human and environmental exposures to tens of thousands of chemicals in commerce. HTT systems have used relatively small sets of gene expression measurements coupled with mathematical prediction methods to estimate genome-wide gene expression and are often trained and validated using pharmaceutical compounds. It is unclear whether these training sets are suitable for general toxicity testing applications and the more diverse chemical space represented by commercial chemicals and environmental contaminants. In this work, we built predictive computational models that inferred whole genome transcriptional profiles from a smaller sample of surrogate genes. The model was trained and validated using a large scale toxicogenomics database with gene expression data from exposure to heterogeneous chemicals from a wide range of classes (the Open TG-GATEs data base). The method of predictor selection was designed to allow high fidelity gene prediction from any pre-existing gene expression data set, regardless of animal species or data measurement platform. Predictive qualitative models were developed with this TG-GATES data that contained gene expression data of human primary hepatocytes with over 941 samples covering 158 compounds. A sequential forward search-based greedy algorithm, combining different fitting approaches and machine learning techniques, was used to find an optimal set of surrogate genes that predicted differential expression changes of the remaining genome. We then used pathway enrichment of up-regulated and down-regulated genes to assess the ability of a limited gene set to determine relevant patterns of tissue response. In addition, we compared prediction performance using the surrogate genes found from our greedy algorithm (referred to as the SV2000) with the landmark genes provided by existing technologies such as L1000 (Genometry) and S1500 (Tox21), finding better predictive performance for the SV2000. The ability of these predictive algorithms to predict pathway level responses is a positive step toward incorporating mode of action (MOA) analysis into the high throughput prioritization and testing of the large number of chemicals in need of safety evaluation.
Nitrogen-containing-bisphosphonates (N-BPs) are widely prescribed to treat osteoporosis and other bone-related diseases. Although previous studies established that N-BPs function by inhibiting the mevalonate pathway in osteoclasts, the mechanism by which N-BPs enter the cytosol from the extracellular space to reach their molecular target is not understood. Here we implemented a CRISPRi-mediated genome-wide screen and identified SLC37A3 (solute carrier family 37 member A3) as a gene required for the action of N-BPs in mammalian cells. We observed that SLC37A3 forms a complex with ATRAID (all-trans retinoic acid-induced differentiation factor), a previously identified genetic target of N-BPs. SLC37A3 and ATRAID localize to lysosomes and are required for releasing N-BP molecules that have trafficked to lysosomes through fluid-phase endocytosis into the cytosol. Our results elucidate the route by which N-BPs are delivered to their molecular target, addressing a key aspect of the mechanism o...
Produce resources to unravel the interface between insulin action, insulin resistance and the genetics of type 2 diabetes including an annotated public database, standardized protocols for gene expression and proteomic analysis, and ultimately diabetes-specific and insulin action-specific DNA chips for investigators in the field. The project aims to identify the sets of the genes involved in insulin action and the predisposition to type 2 diabetes, as well as the secondary changes in gene expression that occur in response to the metabolic abnormalities present in diabetes. There are five major and one pilot project involving human and rodent tissues that are designed to: * Create a database of the genes expressed in insulin-responsive tissues, as well as accessible tissues, that are regulated by insulin, insulin resistance and diabetes. * Assess levels and patterns of gene expression in each tissue before and after insulin stimulation in normal and genetically-modified rodents; normal, insulin resistant and diabetic humans, and in cultured and freshly isolated cell models. * Correlate the level and patterns of expression at the mRNA and/or protein level with the genetic and metabolic phenotype of the animal or cell. * Generate genomic sequence from a panel of humans with type 2 diabetes focusing on the genes most highly regulated by insulin and diabetes to determine the range of sequence and expression variation in these genes and the proteins they encode, which might affect the risk of diabetes or insulin resistance. The DGAP project will define: * the normal anatomy of gene expression, i.e. basal levels of expression and response to insulin. * the morbid anatomy of gene expression, i.e., the impact of diabetes on expression patterns and the insulin response. * the extent to which genetic variability might contribute to the alterations in expression or to diabetes itself.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Efficient high-throughput transcriptomics (HTT) tools promise inexpensive, rapid assessment of possible biological consequences of human and environmental exposures to tens of thousands of chemicals in commerce. HTT systems have used relatively small sets of gene expression measurements coupled with mathematical prediction methods to estimate genome-wide gene expression and are often trained and validated using pharmaceutical compounds. It is unclear whether these training sets are suitable for general toxicity testing applications and the more diverse chemical space represented by commercial chemicals and environmental contaminants. In this work, we built predictive computational models that inferred whole genome transcriptional profiles from a smaller sample of surrogate genes. The model was trained and validated using a large scale toxicogenomics database with gene expression data from exposure to heterogeneous chemicals from a wide range of classes (the Open TG-GATEs data base). The method of predictor selection was designed to allow high fidelity gene prediction from any pre-existing gene expression data set, regardless of animal species or data measurement platform. Predictive qualitative models were developed with this TG-GATES data that contained gene expression data of human primary hepatocytes with over 941 samples covering 158 compounds. A sequential forward search-based greedy algorithm, combining different fitting approaches and machine learning techniques, was used to find an optimal set of surrogate genes that predicted differential expression changes of the remaining genome. We then used pathway enrichment of up-regulated and down-regulated genes to assess the ability of a limited gene set to determine relevant patterns of tissue response. In addition, we compared prediction performance using the surrogate genes found from our greedy algorithm (referred to as the SV2000) with the landmark genes provided by existing technologies such as L1000 (Genometry) and S1500 (Tox21), finding better predictive performance for the SV2000. The ability of these predictive algorithms to predict pathway level responses is a positive step toward incorporating mode of action (MOA) analysis into the high throughput prioritization and testing of the large number of chemicals in need of safety evaluation.
This the data table of the WCSdb: A database of Wild Coffea Species web site. The photo Gallery associated to this web site are available at https://doi.org/10.23708/JZA8I2 and the sequencing data at https://doi.org/10.23708/KWRIJJ Coffee is a beverage enjoyed by millions of people worldwide and an important commodity for millions of people. Beside the two cultivated species (Coffea arabica and Coffea canephora), the 139 wild coffee species belong to the Coffea genus are largely unknown to coffee scientists and breeders although these species may be crucial for future coffee crop development to face climate changes. Here we present the Wild Coffee Species database (WCSdb) hosted by Pl@ntNet platform (http://publish.plantnet-project.org/project/wildcofdb_en), providing information for 140 coffee species, for which 84 contain a photo gallery and 82 contain sequencing data (GBS, chloroplast or whole genome sequences). The objective of this database is to better understand and characterize the species (identification, morphology, biochemical compounds, genetic diversity, sequence data) in order to better protect and promote them.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A major challenge of plant biology is to unravel the genetic basis of complex traits. We took advantage of recent technical advances in high-throughput phenotyping in conjunction with genome-wide association studies to elucidate genotype-phenotype relationships at high temporal resolution. A diverse Brassica napus population from a commercial breeding programme was analysed by automated non-invasive phenotyping. Time-resolved data for early growth-related traits, including estimated biovolume, projected leaf area, early plant height, colour uniformity were established and complemented by fresh and dry weight biomass. Genome-wide SNP array data provided the framework for genome-wide association analyses. Using time point data and relative growth rates, multiple robust main effect marker-trait associations for biomass and related traits were detected. Candidate genes involved in meristem development, cell wall modification and transcriptional regulation were detected. Our results demonstrate that early plant growth is a highly complex trait governed by several medium and many small effect loci, most of which act only during narrowly defined phases. These observations highlight the importance of taking the temporal patterns of QTL/allele actions into account and emphasize the need for detailed time-resolved analyses to effectively unravel the complex and stage-specific contributions of genes affecting growth processes that operate at different developmental phases.
At high-latitude, climatic shifts hypothetically drove episodes of divergence during isolation in glacial refugia, or ice-free pockets of land that enabled terrestrial species persistence. Upon glacial recession, populations can expand and often come into contact, resulting in admixture between previously isolated groups. To understand how recurrent periods of isolation and contact have impacted evolution at high latitudes, we investigated introgression in the stoat (Mustela erminea), a Holarctic mammalian carnivore, using whole-genome sequences. We identify two temporally isolated introgression events coincident with large-scale climatic shifts: contemporary introgression in a mainland contact zone and ancient contact ~ 200 km south along North America’s North Pacific Coast. Repeated episodes of gene flow highlight the central role of cyclic climates in structuring high-latitude diversity, through refugial divergence and subsequent introgressive hybridization. Introgression followed by allopatry (e.g., insularization) may contribute to expedited divergence of island taxa experiencing substantial glacial flux. Tar ball of genome .fq files of Mustela erminea (N=10)Tar ball of genome ,fq files of Mustela erminea, representing specimens from Museum of Southwestern Biology (MSB) with catalog number: 152905 (ME01), 221783 (ME02), 199855 (ME03), 43333 (ME04), 145234 (ME05), 148962 (ME06), 149365 (ME07), 144524 (ME08), 215111 (ME09), 248153 (ME10).ermineGenome_fq_files.tar.gzTar ball of mitochondrial genome .fq files of Mustela erminea (N=10)Tar ball of mitochondrial genome ,fq files of Mustela erminea, representing specimens from Museum of Southwestern Biology (MSB) with catalog number: 152905 (ME01), 221783 (ME02), 199855 (ME03), 43333 (ME04), 145234 (ME05), 148962 (ME06), 149365 (ME07), 144524 (ME08), 215111 (ME09), 248153 (ME10).ermineMitogenome_fq_files.gz
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ENCODES a protein that exhibits sodium channel regulator activity (ortholog); transmembrane transporter binding (ortholog); voltage-gated sodium channel activity involved in cardiac muscle cell action potential (ortholog); INVOLVED IN atrial cardiac muscle cell action potential (ortholog); cardiac conduction (ortholog); cardiac muscle cell action potential involved in contraction (ortholog); ASSOCIATED WITH Brugada syndrome (ortholog); Brugada syndrome 7 (ortholog); Cardiac Arrhythmias (ortholog); FOUND IN integral component of membrane (ortholog); plasma membrane (ortholog); voltage-gated sodium channel complex (ortholog)
The Ninu (Greater bilby, Macrotis lagotis) is a desert-dwelling, culturally and ecologically important marsupial. In collaboration with Indigenous rangers and conservation managers, we generated the first Ninu chromosome-level genome assembly (3.66 Gbp) and genome sequences for the extinct Yallara (Lesser bilby, Macrotis leucura). We developed and tested a scat SNP panel, based on our genomic datasets, to inform current and future conservation actions, to undertake future ecological assessments, and improve our understanding of Ninu genetic diversity in managed and wild populations. We also assessed the beneficial impact of targeted conservation actions, like translocations, in the contemporary metapopulation (N=363 Ninu). Resequenced genomes (temperate Ninu=6; semi-arid Ninu=6; Yallara=4) revealed two major population crashes during global cooling events for both species and differences in Ninu genes involved in anatomical and metabolic pathway adaptations to aridity. Despite their 45-..., There are two datasets included in the Excel workbook:
A set of SNPs (n=9906) generated using DArTseq, a form of reduced representation sequencing and used in the population genetic analyses of the Ninu (n=363 samples). See Supplementary Note 2.4 for details of SNP calling and filtering. Briefly, reads were cleaned and aligned to the Ninu reference genome generated in this study (v1.9). Variants were called with Stacks and filtered to retain SNPs with a minor allele frequency of ≥0.01, minimum average allelic depth of 2.5 x per allele, allelic coverage difference ≤80%, call rate ≥70%, locus heterozygosity ≤90%, and reproducibility between technical replicates ≥90%, and remove putatively sex-linked SNPs. A set of SNPs (n=35) generated using DArTseq, a form of reduced representation sequencing and selected for use in the MassARRAY Ninu scat genotyping assay. The DArTseq reads were mapped to an earlier version of the Ninu reference genome generated in this study (v1.4.3). See Supplementar..., , # Extant and extinct bilby genomes combined with Indigenous knowledge improve conservation of a unique Australian marsupial
https://doi.org/10.5061/dryad.gtht76htz
This accession contains both code used to generate the MassARRAY SNPs and DArTseq genotype data associated with the Ninu (Greater bilby, Macrotis lagotis) 1) scat genotyping MassARRAY design and 2) population genetic analysis described in Hogg et al.
The text file "Bilby_03_Supplementary_Text_File-code-final" contains R code in RMarkdown format used in the design of the SNP panel used in the custom MassARRAY scat genotyping assay. The code is annotated throughout and includes the following steps:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ENCODES a protein that exhibits A-type (transient outward) potassium channel activity (ortholog); monoatomic ion channel activity (ortholog); potassium channel activity (ortholog); INVOLVED IN action potential (ortholog); cardiac muscle cell action potential (ortholog); cellular response to hypoxia (ortholog); PARTICIPATES IN alfentanil pharmacodynamics pathway; bupivacaine pharmacodynamics pathway; buprenorphine pharmacodynamics pathway; ASSOCIATED WITH cholangiocarcinoma (ortholog); early myoclonic encephalopathy (ortholog); genetic disease (ortholog); FOUND IN caveola (ortholog); dendrite (ortholog); dendritic spine (ortholog); INTERACTS WITH potassium atom
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
While the hypothalamo-pituitary-adrenal axis (HPA) activates a general stress response by increasing glucocorticoid (Gc) synthesis, biological stress resulting from infections triggers the inflammatory response through production of cytokines. The pituitary gland integrates some of these signals by responding to the pro-inflammatory cytokines IL6 and LIF and to a negative Gc feedback loop. The present work used whole-genome approaches to define the LIF/STAT3 regulatory network and to delineate cross-talk between this pathway and Gc action. Genome-wide ChIP-chip identified 3,449 STAT3 binding sites, whereas 2,396 genes regulated by LIF and/or Gc were found by expression profiling. Surprisingly, LIF on its own changed expression of only 85 genes but the joint action of LIF and Gc potentiated the expression of more than a thousand genes. Accordingly, activation of both LIF and Gc pathways also potentiated STAT3 and GR recruitment to many STAT3 targets. Our analyses revealed an unexpected gene cluster that requires both stimuli for delayed activation; 83% of the genes in this cluster are involved in different cell defense mechanisms. Thus, stressors that trigger both general stress and inflammatory responses lead to activation of a stereotypic innate cellular defense response.
Methicillin resistant Staphylococcus aureus (MRSA) infection is becoming refractory to existing antibiotic therapy owing to the inherent ability of S. aureus to develop rapid resistance and is considered a major threat to public health. We found that a natural isolate of Bacillus pumilus from the Columbia River Estuary produces a strong anti-MRSA compound, amicoumacin A. As amicoumacin A has been reported to exhibit anti-microbial, anti-inflammatory, and anti-ulcer activities, we sought to uncover its mechanism of action. Genome-wide transcriptome analysis of S. aureus COL in response to amicoumacin A showed alteration in the expression of genes involved in several cellular processes including cell envelope turnover, cross-membrane transport, virulence, metabolism, and general stress response. The most highly induced gene was lrgA, encoding an antiholin-like product, which has been shown to be induced in response to a collapse of membrane potential. In order to gain further insight into the mechanism of action of amicoumacin A, a whole genome comparison of wild-type COL and amicoumacin A-resistant mutants isolated by serial passage method was carried out. Single point mutations resulting in codon substitutions were uncovered in several distinct genes: ksgA, RNA dimethyltranferase; fusA, elongation factor G; dnaG, primase, ; lacD, tagatose 1,6-bisphosphate aldolase, ; and SACOL0611, encoding a putative glycosyl transferase gene. Based on these results, a candidate approach was undertaken to recreate the same amino acid substitution individually in FusA and KsgA, each of which resulted in two-fold resistance towards amicoumacin A. The fusA gene is known as the site for fusidic acid- resistant mutations; however the codon substitutions in EF-G that cause amicoumacin A resistance and fusidic acid resistance occur in separate domains and do not bring about cross resistance. Taken together, these results suggest that amicoumacin A might cause perturbation of the cell membrane and lead to energy dissipation. Decreased rates of cellular metabolism including protein synthesis and DNA replication in resistant strains might allow cells to compensate for membrane dysfunction and thus increase cell survivability. Amicoumacin A, isolated from Bacillus pumilus, was added to exponentially growing cultures (OD600 =0.5) of Staphylococcus aureus COL at concentrations leading to around 12% and 20% reduction of OD600 after 10 min and 40 min, respectively. Total RNA was isolated from three biological replicates. Labeled cDNA from treated and control cultures (Cy5) was hybridized against a common reference cDNA pool (Cy3). The reference pool was prepared from a mixture of equal amounts of total RNA isolated from all stress and control samples in the experiment.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Protein-Protein, Genetic, and Chemical Interactions for van Werven FJ (2008):Cooperative action of NC2 and Mot1p to regulate TATA-binding protein function across the genome. curated by BioGRID (https://thebiogrid.org); ABSTRACT: Promoter recognition by TATA-binding protein (TBP) is an essential step in the initiation of RNA polymerase II (pol II) mediated transcription. Genetic and biochemical studies in yeast have shown that Mot1p and NC2 play important roles in inhibiting TBP activity. To understand how TBP activity is regulated in a genome-wide manner, we profiled the binding of TBP, NC2, Mot1p, TFIID, SAGA, and pol II across the yeast genome using chromatin immunoprecipitation (ChIP)-chip for cells in exponential growth and during reprogramming of transcription. We find that TBP, NC2, and Mot1p colocalize at transcriptionally active pol II core promoters. Relative binding of NC2alpha and Mot1p is higher at TATA promoters, whereas NC2beta has a preference for TATA-less promoters. In line with the ChIP-chip data, we isolated a stable TBP-NC2-Mot1p-DNA complex from chromatin extracts. ATP hydrolysis releases NC2 and DNA from the Mot1p-TBP complex. In vivo experiments indicate that promoter dissociation of TBP and NC2 is highly dynamic, which is dependent on Mot1p function. Based on these results, we propose that NC2 and Mot1p cooperate to dynamically restrict TBP activity on transcribed promoters.
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
The dataset consists of 375 extracted quotes from 31 community reports relevant to the development of a materials data strategy for the NIST Materials Measurement Laboratory (MML). The dataset is used in the NIST internal report "A Materials Data Strategy." In the past decade, numerous public and private sector documents have highlighted the need for materials data to facilitate advanced technologies in myriad industrial and economic sectors. These documents have been analyzed to identify prevalent gaps in the establishment of an interconnected materials data infrastructure akin to that envisioned in the federal agency-wide Materials Genome Initiative. The internal report uses a uniform schematic format to portray these gaps, illustrate progress in addressing the gaps, and propose an MML roadmap of action items to further address the gaps.
Environmental variation is increasingly recognized as an important driver of diversity in marine species despite the lack of physical barriers to dispersal and the presence of pelagic stages in many taxa. A robust understanding of the genomic and ecological processes involved in structuring populations is lacking for most marine species, often hindering management and conservation action. Cunner (Tautogolabrus adspersus), is a temperate reef fish with both pelagic early life history stages and strong site-associated homing as adults; the species is also of interest for use as a cleaner fish in salmonid aquaculture in Atlantic Canada. We aimed to characterize genomic and geographic differentiation of cunner in the Northwest Atlantic. To achieve this, a chromosome-level genome assembly for cunner was produced and used to characterize spatial population structure throughout Atlantic Canada using whole genome resequencing. The genome assembly spanned 0.72 Gbp and 24 chromosomes; whole genom...
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Local adaptation is facilitated by loci clustered in relatively few regions of the genome, termed genomic islands of divergence. The mechanisms that create and maintain these islands and how they contribute to adaptive divergence is an active research topic. Here, we use sockeye salmon as a model to investigate both the mechanisms responsible for creating islands of divergence and the patterns of differentiation at these islands. Previous research suggested that multiple islands contributed to adaptive radiation of sockeye salmon. However, the low-density genomic methods used by these studies made it difficult to fully elucidate the mechanisms responsible for islands and connect genotypes to adaptive variation. We used whole genome resequencing to genotype millions of loci to investigate patterns of genetic variation at islands and the mechanisms that potentially created them. We discovered 64 islands, including 16 clustered in four genomic regions shared between two isolated populations. Characterization of these four regions suggested that three were likely created by structural variation, while one was created by processes not involving structural variation. All four regions were small (< 600 kb), suggesting low recombination regions do not have to span megabases to be important for adaptive divergence. Differentiation at islands was not consistently associated with established population attributes. In sum, the landscape of adaptive divergence and the mechanisms that create it are complex; this complexity likely helps to facilitate fine-scale local adaptation unique to each population. Methods
Sampling design
We resequenced genomes of sockeye salmon from seven populations in Southwest Alaska, USA (these samples are a subset of those analyzed in Larson et al., 2019). Fin-clips from 27 individuals per population (189 individuals total) were obtained from three lake-type spawning populations in each of the Kvichak River and Wood River drainages as well as one putatively ancestral sea/river population in the Nushagak River drainage. Lake-type samples were further subdivided into the following groups based on spawning habitat: mainland beaches, island beaches, creeks, and rivers. Mainland and island beaches are similar except island beaches are found in the middle of lakes where they are highly affected by wind and wave action (Stewart et al., 2003). Creeks are narrow (< 5 m wide) and shallow (< 0.5 m deep on average) while rivers are wide (> 30 m wide), deep (> 0.5 m deep), and fast flowing (Quinn et al., 2001). All samples were collected from spawning adults by Alaska Department of Fish and Game between 1999 and 2013 and provided as extracted DNA (extracted with Qiagen DNAeasy Blood and Tissue Kits, Hilden, Germany).
Whole genome library preparation and sequencing
Libraries were prepared according to Baym et al. (2015) and Therkildsen and Palumbi (2017) with the following modifications. Input DNA was normalized to 10 ng for each individual. Steps for 96-well AMPure XP (Beckman Colter; Brea, CA) purification; product quantification, normalization, and pooling; and size selection were replaced with a SequalPrep (ThermoFisher Scientific, Waltham, MA, USA) normalization and pooling protocol, similar to that used in GT-seq (Campbell et al., 2015). We used three SequalPrep plates per each of the two 96-well tagmented and adaptor-ligated DNA library plates and pooled the full eluate per individual DNA library to increase total yield. Normalized pooled libraries were subject to a 0.6X size selection, purification, and volume concentration with AMPure XP following Therkildsen and Palumbi (2017). In-house QC consisted of visualization on a precast 2% agarose E-Gel (ThermoFisher Scientific) and quantification with a Qubit HS dsDNA Assay Kit (ThermoFisher Scientific). We constructed two libraries each containing 96 individuals and each of these libraries was sequenced on three Novaseq S4 lanes (six lanes total) at Novogene (Sacramento, CA, USA).
Genotype calling and quality control
Variants and genotypes were called using the Genotype Analysis Toolkit (GATK) version 4.1.7 (DePristo et al., 2011; McKenna et al., 2010) and a protocol that closely followed Christensen et al. (2020). Paired-end reads were aligned to the sockeye salmon genome (GCF_006149115.2; Christensen et al., 2020) with BWA MEM v.0.7.17 (Li, 2013) and indexed and sorted with Samtools v.1.10 (Li et al., 2009). Next, readgroups for each alignment file (bam file) were assigned using Picard v2.22.6 (AddOrReplaceReadGroups; http://broadinstitute.github.io/picard). Individual bam files produced on separate sequencing lanes were merged, and PCR duplicates were marked using the MarkDuplicates function from Picard with stringency set to “LENIENT”. Individual genomic VCF files (gvcf) were generated from alignments using HaplotypeCaller from GATK. A single database was created containing all individual gvcf files using GenomeDBImport from GATK. Once the variants from all individuals had been added to the database, joint-genotyping was conducted using the GenotypeGVCFs function. The resulting variant file (vcf) was then hard filtered using the VariantFiltration function (filter expression = QD < 2.0 || FS > 60.0 || SOR < 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0). All variants that passed hard filter were used in conjunction with three datasets used previously as truth datasets by (Christensen et al., 2020) for GATK’s VarientRecalibrator function. The tranches file generated by VarientRecalibrator was subsequently used as the input for the ApplyVQSR function and to produce a corrected vcf file and submitted to additional variant filtration in VCFtools v.0.1.16 (parameters: --maf 0.05, --max-alleles 2, --min-alleles 2, --max-missing 0.9, --remove-filtered-all –remove-indels; Danecek et al., 2011). Finally, loci with an allele balance of less than 0.2 were marked. The resulting vcf file constituted our baseline file for all other analysis and downstream processing.
Creating a merged dataset
Because the islands of divergence we identified were consistent among spatially isolated drainages in Alaska, we hypothesized that these regions may be conserved in other sockeye populations. To test this, we merged the dataset generated in the present study with whole-genome data from 78 sockeye salmon (kokanee excluded) from Christensen et al. (2020). This dataset was sequenced to a similar depth of coverage and was processed using an almost identical GATK4 pipeline. The dataset included 16 spawning populations that we grouped into five drainage regions: Bristol Bay (N = 12 individuals), Fraser/Columbia river basins (N = 47), Gulf of Alaska (N = 8), Northern British Columbia (N = 9), and Russia (N = 2). The variants identified in Christensen et al. (2020) were merged with ours using bcftools v.1.11 (Danecek et al., 2021) by retaining variants that intersected between the two datasets, had a genotyping rate > 80%, and were positioned within one of the refined haploblock regions.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The erosion of habitat heterogeneity can reduce species diversity directly but can also lead to the loss of distinctiveness of sympatric species through speciation reversal. We know little about changes in genomic differentiation during the early stages of these processes, which can be mediated by anthropogenic perturbation. Here, we analyse three sympatric whitefish species (Coregonus spp) sampled across two neighbouring and connected Swiss pre-alpine lakes, which have been differentially affected by anthropogenic eutrophication. Our data set comprises 16,173 loci genotyped across 138 whitefish using restriction-site associated DNA sequencing (RADseq). Our analysis suggests that in each of the two lakes the population of a different, but ecologically similar, whitefish species declined following a recent period of eutrophication. Genomic signatures consistent with hybridisation are more pronounced in the more severely impacted lake. Comparisons between sympatric pairs of whitefish species with contrasting ecology, where one is shallow benthic and the other one more profundal pelagic, reveal genomic differentiation that is largely correlated along the genome, while differentiation is uncorrelated between pairs of allopatric provenance with similar ecology. We identify four genomic loci that provide evidence of parallel divergent adaptation between the shallow benthic species and the two different more profundal species. Functional annotations available for two of those loci are consistent with divergent ecological adaptation. Our genomic analysis indicates the action of divergent natural selection between sympatric whitefish species in pre-alpine lakes and reveals the vulnerability of these species to anthropogenic alterations of the environment and associated adaptive landscape.
Home Action Genome is a large-scale multi-view video database of indoor daily activities. Every activity is captured by synchronized multi-view cameras, including an egocentric view. There are 30 hours of vides with 70 classes of daily activities and 453 classes of atomic actions.