The Stanford University HIV Drug Resistance Database is a curated public database designed to represent, store, and analyze the different forms of data underlying HIVs drug resistance. HIVDB has three main types of content: (1) Database queries and references, (2) Interactive programs, and (3) Educational resources. Database queries are designed primarily for researchers studying HIV drug resistance. The interactive programs and educational resources are designed for both researchers and those wishing to learn more about HIV drug resistance. 1.DATABASE QUERY AND REFERENCE PAGES Genotype-Treatment Correlations This Genotype-Treatment section of the database links to 15 interactive query pages that explore the relationship between treatment with HIV-1 antiretroviral drugs (ARVs) and mutations in HIV reverse transcriptase (RT), protease, and integrase. There are five types of interactive query pages: Treatment Profiles (Protease and RT inhibitors) Mutation Profiles (Protease and RT mutations) Detailed Treatment Queries (Protease, RT, and integrase inhibitors) Detailed Mutation Queries (Protease, RT, and integrase mutations) Mutation Prevalence According to Subtype and Treatment Genotype-Phenotype Correlations The main page of the Genotype-Phenotype Correlations section links to four interactive query pages: three dynamically updated data summaries and one regularly updated downloadable dataset. Drug Resistance Positions Query for levels of resistance associated with known drug resistance mutations Detailed Phenotype Queries Queries for levels of resistance associated with individual mutations or mutation combinations at all positions of protease, RT, and integrase Patterns of Drug Resistance Mutations Downloadable Reference Dataset Genotype-Clinical Correlations This part of the database has two main sections: Clinical Trials Datasets Summaries of Clinical Studies References This part of the database has two main sections: one with summaries of the data from each of the references in HIVDB and one in which every primate immunodeficiency virus sequence in GenBank is annotated according to its presence or absence in HIVDB. Studies in HIVDB GenBank HIVDB New Submissions Approximately every three months, the New Submissions section lists the studies that have been entered into HIVDB. The study title links to the introductory page of the study in the References section. Database Statistics (http://hivdb.stanford.edu/pages/HIVdbStatistics.html) 2. INTERACTIVE PROGRAMS HIVDB has seven main interactive programs. 1. HIVdb Program Mutation List Analysis Sequence Analysis HIVdb Output Sierra Web Service Release Notes Algorithm Specification Interface (ASI) 2. HIValg Program 3. HIVseq Program 4. Calibrated Population Resistance (CPR) tool 5. Mutation ARV Evidence Listing (MARVEL) 6. ART-AiDE 7. Rega HIV-1 Subtyping tool Three programs in the HIV Drug Resistance Database share a common code base: HIVseq, HIVdb, and HIValg. HIVseq accepts user-submitted protease, RT, and integrase sequences, compares them to the consensus subtype B reference sequence, and uses the differences as query parameters for interrogating the HIV Drug Resistance database (Shafer, D Jung, & B Betts, Nat Med 2000; Rhee SY et al. AIDS 2006). The query result provides users with the prevalence of protease, RT and integrase mutations according to subtype and PI, nucleoside RT inhibitor (NRTI), non-nucleoside RT inhibitor (NNRTI), and integrase inhibitor (INI) exposure. This allows users to detect unusual sequence results immediately so that the person doing the sequencing can check the primary sequence output while it is still on the desktop. In addition, unexpected associations between sequences or isolates can be discovered by immediately retrieving data on isolates sharing one or more mutations with the sequence. There are three ways in which the HIVdb program can be used: (i) entering a list of protease and RT mutations, (ii) entering a complete sequence containing protease, RT, and/or integrase, and (iii) using a Web Service. HIVdb is an expert system that accepts user-submitted HIV-1 pol sequences and returns inferred levels of resistance to 20 FDA-approved ARV drugs including 8 PIs, 7 NRTIs, 4 NNRTIs, and - with this update - one INI. In the HIVdb system, each HIV-1 drug resistance mutation is assigned a drug penalty score and a comment; the total score for a drug is derived by adding the scores of each mutation associated with resistance to that drug. Using the total drug score, the program reports one of the following levels of inferred drug resistance: susceptible, potential low-level resistance, low-level resistance, intermediate resistance, and high-level resistance. HIValg is designed for users interested in comparing the results of different algorithms or who are interested in comparing and evaluating existing and newly developed algorithms. The ability to develop new algorithms that can be run on the HIV Drug Resistance Database depends on the Algorithm Specific Interface (ASI) compiler (Shafer & Betts JCM 2003). Submission of Sequences and Mutations For each of the three programs, sequences can be entered using either the Sequence Analysis Form or the Mutation List form. 3. EDUCATIONAL RESOURCES HIVDB contains several regularly updated sections summarizing data linking RT, protease, and integrase mutations and antiretroviral drugs (ARVs). These sections include (i) tabular summaries of the major mutations associated with each ARV class, (ii) detailed summaries of the major, minor, and accessory mutations associated with each ARV, (iii) the comments used by the HIVdb program, (iv) the scores used by the HIVdb program, (v) clinical studies in which baseline drug resistance mutations have been correlated with the virological response (clinical outcome) to a specific ARV, (vi) mutations that can be used for drug resistance surveillance, and (vii) a two-page PDF handout. 1. Drug Resistance Summaries Tabular Drug Resistance Summaries by ARV Class Detailed Drug Resistance Summaries by ARV Drug Resistance Mutation Comments Used by the HIVdb Program Drug Resistance Mutation Scores Used by the HIVdb Program Genotype-Clinical Outcome Correlation Studies 2. Surveillance Drug-Resistance Mutation List Section 3. PDF Handout Grant Support 1. National Institute for Allergy and Infectious Diseases (NIAID, NIH): Online HIV Drug Resistance Database (PI: Robert W. Shafer, MD, 1R01AI68581-01A1), 04/01/06 - 3/31/11 2. National Institute for Allergy and Infectious Diseases (NIAID, NIH) supplement to the grant Identification of Multidrug-Resistant HIV-1 Isolates (PI: Robert W. Shafer, MD, AI46148-01): Supplement provided 1999-2005. 3. NIH/NIGMS Program Project on AIDS Structural Biology Program Project: Targeting Ensembles of Drug Resistant Protease Variants (PI: Celia Schiffer, PhD, University of Massachusetts): 2002-2007 4. University-wide AIDS Research Program (CR03-ST-524). Community collaborative award: Optimizing Clinical HIV Genotypic Resistance Interpretation: Principal Investigators: Robert W. Shafer, MD and W. Jeffrey Fessel MD (Kaiser Permanente Medical Care Program): 2004-2005 5. Stanford University Bio-X Interdisciplinary Initiative: HIV Gene Sequence Analysis for Drug Resistance Studies: A Pharmacogenetic Challenge Principal Investigators: Robert W. Shafer, MD and Daphne Koller, Ph.D. (Computer Science): 2000-2002
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Concerns about gender bias in word embedding models have captured substantial attention in the algorithmic bias research literature. Other bias types however have received lesser amounts of scrutiny. This work describes a large-scale analysis of sentiment associations in popular word embedding models along the lines of gender and ethnicity but also along the less frequently studied dimensions of socioeconomic status, age, physical appearance, sexual orientation, religious sentiment and political leanings. Consistent with previous scholarly literature, this work has found systemic bias against given names popular among African-Americans in most embedding models examined. Gender bias in embedding models however appears to be multifaceted and often reversed in polarity to what has been regularly reported. Interestingly, using the common operationalization of the term bias in the fairness literature, novel types of so far unreported bias types in word embedding models have also been identified. Specifically, the popular embedding models analyzed here display negative biases against middle and working-class socioeconomic status, male children, senior citizens, plain physical appearance and intellectual phenomena such as Islamic religious faith, non-religiosity and conservative political orientation. Reasons for the paradoxical underreporting of these bias types in the relevant literature are probably manifold but widely held blind spots when searching for algorithmic bias and a lack of widespread technical jargon to unambiguously describe a variety of algorithmic associations could conceivably be playing a role. The causal origins for the multiplicity of loaded associations attached to distinct demographic groups within embedding models are often unclear but the heterogeneity of said associations and their potential multifactorial roots raises doubts about the validity of grouping them all under the umbrella term bias. Richer and more fine-grained terminology as well as a more comprehensive exploration of the bias landscape could help the fairness epistemic community to characterize and neutralize algorithmic discrimination more efficiently.
Methods This data set has collected several popular pre-trained word embedding models.
-Word2vec Skip-Gram trained on Google News corpus (100B tokens) https://code.google.com/archive/p/word2vec/
-Glove trained on Wikipedia 2014 + Gigaword 5 (6B tokens) http://nlp.stanford.edu/data/glove.6B.zip
-Glove trained on 2B tweets Twitter corpus (27B tokens) http://nlp.stanford.edu/data/glove.twitter.27B.zip
-Glove trained on Common Crawl (42B tokens) http://nlp.stanford.edu/data/glove.42B.300d.zip
-Glove trained on Common Crawl (840B tokens) http://nlp.stanford.edu/data/glove.840B.300d.zip
-FastText trained with subword infomation on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens) https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.vec.zip
-Fastext trained with subword infomation on Common Crawl (600B tokens) https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M-subword.zip"
Optum ZIP5 v8.0 database in the OMOP data model (https://www.ohdsi.org/data-standardization/the-common-data-model/). This dataset covers 2003-Q1 to 2020-Q2
A Condition Era is defined as a span of time when the Person is assumed to have a given condition. Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:
%3C!-- --%3E
For example, consider a Person who visits her Primary Care Physician (PCP) and who is referred to a specialist. At a later time, the Person visits the specialist, who confirms the PCP's original diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era.v
Conventions
%3C!-- --%3E
The text above is taken from the OMOP CDM v5.3 Specification document.
The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables. For example, the "Condition" Domain contains Concepts that describe a condition of a patient, and these Concepts can only be stored in the condition_concept_id field of the CONDITION_OCCURRENCE and CONDITION_ERA tables. This reference table is populated with a single record for each Domain and includes a descriptive name for the Domain.
Conventions
%3C!-- --%3E
The text above is taken from the OMOP CDM v5.3 Specification document.
A Drug Era is defined as a span of time when the Person is assumed to be exposed to a particular active ingredient. A Drug Era is not the same as a Drug Exposure: Exposures are individual records corresponding to the source when Drug was delivered to the Person, while successive periods of Drug Exposures are combined under certain rules to produce continuous Drug Eras.
Conventions
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The Stanford University HIV Drug Resistance Database is a curated public database designed to represent, store, and analyze the different forms of data underlying HIVs drug resistance. HIVDB has three main types of content: (1) Database queries and references, (2) Interactive programs, and (3) Educational resources. Database queries are designed primarily for researchers studying HIV drug resistance. The interactive programs and educational resources are designed for both researchers and those wishing to learn more about HIV drug resistance. 1.DATABASE QUERY AND REFERENCE PAGES Genotype-Treatment Correlations This Genotype-Treatment section of the database links to 15 interactive query pages that explore the relationship between treatment with HIV-1 antiretroviral drugs (ARVs) and mutations in HIV reverse transcriptase (RT), protease, and integrase. There are five types of interactive query pages: Treatment Profiles (Protease and RT inhibitors) Mutation Profiles (Protease and RT mutations) Detailed Treatment Queries (Protease, RT, and integrase inhibitors) Detailed Mutation Queries (Protease, RT, and integrase mutations) Mutation Prevalence According to Subtype and Treatment Genotype-Phenotype Correlations The main page of the Genotype-Phenotype Correlations section links to four interactive query pages: three dynamically updated data summaries and one regularly updated downloadable dataset. Drug Resistance Positions Query for levels of resistance associated with known drug resistance mutations Detailed Phenotype Queries Queries for levels of resistance associated with individual mutations or mutation combinations at all positions of protease, RT, and integrase Patterns of Drug Resistance Mutations Downloadable Reference Dataset Genotype-Clinical Correlations This part of the database has two main sections: Clinical Trials Datasets Summaries of Clinical Studies References This part of the database has two main sections: one with summaries of the data from each of the references in HIVDB and one in which every primate immunodeficiency virus sequence in GenBank is annotated according to its presence or absence in HIVDB. Studies in HIVDB GenBank HIVDB New Submissions Approximately every three months, the New Submissions section lists the studies that have been entered into HIVDB. The study title links to the introductory page of the study in the References section. Database Statistics (http://hivdb.stanford.edu/pages/HIVdbStatistics.html) 2. INTERACTIVE PROGRAMS HIVDB has seven main interactive programs. 1. HIVdb Program Mutation List Analysis Sequence Analysis HIVdb Output Sierra Web Service Release Notes Algorithm Specification Interface (ASI) 2. HIValg Program 3. HIVseq Program 4. Calibrated Population Resistance (CPR) tool 5. Mutation ARV Evidence Listing (MARVEL) 6. ART-AiDE 7. Rega HIV-1 Subtyping tool Three programs in the HIV Drug Resistance Database share a common code base: HIVseq, HIVdb, and HIValg. HIVseq accepts user-submitted protease, RT, and integrase sequences, compares them to the consensus subtype B reference sequence, and uses the differences as query parameters for interrogating the HIV Drug Resistance database (Shafer, D Jung, & B Betts, Nat Med 2000; Rhee SY et al. AIDS 2006). The query result provides users with the prevalence of protease, RT and integrase mutations according to subtype and PI, nucleoside RT inhibitor (NRTI), non-nucleoside RT inhibitor (NNRTI), and integrase inhibitor (INI) exposure. This allows users to detect unusual sequence results immediately so that the person doing the sequencing can check the primary sequence output while it is still on the desktop. In addition, unexpected associations between sequences or isolates can be discovered by immediately retrieving data on isolates sharing one or more mutations with the sequence. There are three ways in which the HIVdb program can be used: (i) entering a list of protease and RT mutations, (ii) entering a complete sequence containing protease, RT, and/or integrase, and (iii) using a Web Service. HIVdb is an expert system that accepts user-submitted HIV-1 pol sequences and returns inferred levels of resistance to 20 FDA-approved ARV drugs including 8 PIs, 7 NRTIs, 4 NNRTIs, and - with this update - one INI. In the HIVdb system, each HIV-1 drug resistance mutation is assigned a drug penalty score and a comment; the total score for a drug is derived by adding the scores of each mutation associated with resistance to that drug. Using the total drug score, the program reports one of the following levels of inferred drug resistance: susceptible, potential low-level resistance, low-level resistance, intermediate resistance, and high-level resistance. HIValg is designed for users interested in comparing the results of different algorithms or who are interested in comparing and evaluating existing and newly developed algorithms. The ability to develop new algorithms that can be run on the HIV Drug Resistance Database depends on the Algorithm Specific Interface (ASI) compiler (Shafer & Betts JCM 2003). Submission of Sequences and Mutations For each of the three programs, sequences can be entered using either the Sequence Analysis Form or the Mutation List form. 3. EDUCATIONAL RESOURCES HIVDB contains several regularly updated sections summarizing data linking RT, protease, and integrase mutations and antiretroviral drugs (ARVs). These sections include (i) tabular summaries of the major mutations associated with each ARV class, (ii) detailed summaries of the major, minor, and accessory mutations associated with each ARV, (iii) the comments used by the HIVdb program, (iv) the scores used by the HIVdb program, (v) clinical studies in which baseline drug resistance mutations have been correlated with the virological response (clinical outcome) to a specific ARV, (vi) mutations that can be used for drug resistance surveillance, and (vii) a two-page PDF handout. 1. Drug Resistance Summaries Tabular Drug Resistance Summaries by ARV Class Detailed Drug Resistance Summaries by ARV Drug Resistance Mutation Comments Used by the HIVdb Program Drug Resistance Mutation Scores Used by the HIVdb Program Genotype-Clinical Outcome Correlation Studies 2. Surveillance Drug-Resistance Mutation List Section 3. PDF Handout Grant Support 1. National Institute for Allergy and Infectious Diseases (NIAID, NIH): Online HIV Drug Resistance Database (PI: Robert W. Shafer, MD, 1R01AI68581-01A1), 04/01/06 - 3/31/11 2. National Institute for Allergy and Infectious Diseases (NIAID, NIH) supplement to the grant Identification of Multidrug-Resistant HIV-1 Isolates (PI: Robert W. Shafer, MD, AI46148-01): Supplement provided 1999-2005. 3. NIH/NIGMS Program Project on AIDS Structural Biology Program Project: Targeting Ensembles of Drug Resistant Protease Variants (PI: Celia Schiffer, PhD, University of Massachusetts): 2002-2007 4. University-wide AIDS Research Program (CR03-ST-524). Community collaborative award: Optimizing Clinical HIV Genotypic Resistance Interpretation: Principal Investigators: Robert W. Shafer, MD and W. Jeffrey Fessel MD (Kaiser Permanente Medical Care Program): 2004-2005 5. Stanford University Bio-X Interdisciplinary Initiative: HIV Gene Sequence Analysis for Drug Resistance Studies: A Pharmacogenetic Challenge Principal Investigators: Robert W. Shafer, MD and Daphne Koller, Ph.D. (Computer Science): 2000-2002