3 datasets found

n
Stanford University HIV Drug Resistance Database
neuinfo.org
rrid.site
+1more
Updated Jun 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Stanford University HIV Drug Resistance Database [Dataset]. http://identifiers.org/RRID:SCR_006631
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006631 https://identifiers.org/RRID:SCR_006631/resolver?q=&i=rrid
Dataset updated
Jun 11, 2018
Description
The Stanford University HIV Drug Resistance Database is a curated public database designed to represent, store, and analyze the different forms of data underlying HIVs drug resistance. HIVDB has three main types of content: (1) Database queries and references, (2) Interactive programs, and (3) Educational resources. Database queries are designed primarily for researchers studying HIV drug resistance. The interactive programs and educational resources are designed for both researchers and those wishing to learn more about HIV drug resistance. 1.DATABASE QUERY AND REFERENCE PAGES Genotype-Treatment Correlations This Genotype-Treatment section of the database links to 15 interactive query pages that explore the relationship between treatment with HIV-1 antiretroviral drugs (ARVs) and mutations in HIV reverse transcriptase (RT), protease, and integrase. There are five types of interactive query pages: Treatment Profiles (Protease and RT inhibitors) Mutation Profiles (Protease and RT mutations) Detailed Treatment Queries (Protease, RT, and integrase inhibitors) Detailed Mutation Queries (Protease, RT, and integrase mutations) Mutation Prevalence According to Subtype and Treatment Genotype-Phenotype Correlations The main page of the Genotype-Phenotype Correlations section links to four interactive query pages: three dynamically updated data summaries and one regularly updated downloadable dataset. Drug Resistance Positions Query for levels of resistance associated with known drug resistance mutations Detailed Phenotype Queries Queries for levels of resistance associated with individual mutations or mutation combinations at all positions of protease, RT, and integrase Patterns of Drug Resistance Mutations Downloadable Reference Dataset Genotype-Clinical Correlations This part of the database has two main sections: Clinical Trials Datasets Summaries of Clinical Studies References This part of the database has two main sections: one with summaries of the data from each of the references in HIVDB and one in which every primate immunodeficiency virus sequence in GenBank is annotated according to its presence or absence in HIVDB. Studies in HIVDB GenBank HIVDB New Submissions Approximately every three months, the New Submissions section lists the studies that have been entered into HIVDB. The study title links to the introductory page of the study in the References section. Database Statistics (http://hivdb.stanford.edu/pages/HIVdbStatistics.html) 2. INTERACTIVE PROGRAMS HIVDB has seven main interactive programs. 1. HIVdb Program Mutation List Analysis Sequence Analysis HIVdb Output Sierra Web Service Release Notes Algorithm Specification Interface (ASI) 2. HIValg Program 3. HIVseq Program 4. Calibrated Population Resistance (CPR) tool 5. Mutation ARV Evidence Listing (MARVEL) 6. ART-AiDE 7. Rega HIV-1 Subtyping tool Three programs in the HIV Drug Resistance Database share a common code base: HIVseq, HIVdb, and HIValg. HIVseq accepts user-submitted protease, RT, and integrase sequences, compares them to the consensus subtype B reference sequence, and uses the differences as query parameters for interrogating the HIV Drug Resistance database (Shafer, D Jung, & B Betts, Nat Med 2000; Rhee SY et al. AIDS 2006). The query result provides users with the prevalence of protease, RT and integrase mutations according to subtype and PI, nucleoside RT inhibitor (NRTI), non-nucleoside RT inhibitor (NNRTI), and integrase inhibitor (INI) exposure. This allows users to detect unusual sequence results immediately so that the person doing the sequencing can check the primary sequence output while it is still on the desktop. In addition, unexpected associations between sequences or isolates can be discovered by immediately retrieving data on isolates sharing one or more mutations with the sequence. There are three ways in which the HIVdb program can be used: (i) entering a list of protease and RT mutations, (ii) entering a complete sequence containing protease, RT, and/or integrase, and (iii) using a Web Service. HIVdb is an expert system that accepts user-submitted HIV-1 pol sequences and returns inferred levels of resistance to 20 FDA-approved ARV drugs including 8 PIs, 7 NRTIs, 4 NNRTIs, and - with this update - one INI. In the HIVdb system, each HIV-1 drug resistance mutation is assigned a drug penalty score and a comment; the total score for a drug is derived by adding the scores of each mutation associated with resistance to that drug. Using the total drug score, the program reports one of the following levels of inferred drug resistance: susceptible, potential low-level resistance, low-level resistance, intermediate resistance, and high-level resistance. HIValg is designed for users interested in comparing the results of different algorithms or who are interested in comparing and evaluating existing and newly developed algorithms. The ability to develop new algorithms that can be run on the HIV Drug Resistance Database depends on the Algorithm Specific Interface (ASI) compiler (Shafer & Betts JCM 2003). Submission of Sequences and Mutations For each of the three programs, sequences can be entered using either the Sequence Analysis Form or the Mutation List form. 3. EDUCATIONAL RESOURCES HIVDB contains several regularly updated sections summarizing data linking RT, protease, and integrase mutations and antiretroviral drugs (ARVs). These sections include (i) tabular summaries of the major mutations associated with each ARV class, (ii) detailed summaries of the major, minor, and accessory mutations associated with each ARV, (iii) the comments used by the HIVdb program, (iv) the scores used by the HIVdb program, (v) clinical studies in which baseline drug resistance mutations have been correlated with the virological response (clinical outcome) to a specific ARV, (vi) mutations that can be used for drug resistance surveillance, and (vii) a two-page PDF handout. 1. Drug Resistance Summaries Tabular Drug Resistance Summaries by ARV Class Detailed Drug Resistance Summaries by ARV Drug Resistance Mutation Comments Used by the HIVdb Program Drug Resistance Mutation Scores Used by the HIVdb Program Genotype-Clinical Outcome Correlation Studies 2. Surveillance Drug-Resistance Mutation List Section 3. PDF Handout Grant Support 1. National Institute for Allergy and Infectious Diseases (NIAID, NIH): Online HIV Drug Resistance Database (PI: Robert W. Shafer, MD, 1R01AI68581-01A1), 04/01/06 - 3/31/11 2. National Institute for Allergy and Infectious Diseases (NIAID, NIH) supplement to the grant Identification of Multidrug-Resistant HIV-1 Isolates (PI: Robert W. Shafer, MD, AI46148-01): Supplement provided 1999-2005. 3. NIH/NIGMS Program Project on AIDS Structural Biology Program Project: Targeting Ensembles of Drug Resistant Protease Variants (PI: Celia Schiffer, PhD, University of Massachusetts): 2002-2007 4. University-wide AIDS Research Program (CR03-ST-524). Community collaborative award: Optimizing Clinical HIV Genotypic Resistance Interpretation: Principal Investigators: Robert W. Shafer, MD and W. Jeffrey Fessel MD (Kaiser Permanente Medical Care Program): 2004-2005 5. Stanford University Bio-X Interdisciplinary Initiative: HIV Gene Sequence Analysis for Drug Resistance Studies: A Pharmacogenetic Challenge Principal Investigators: Robert W. Shafer, MD and Daphne Koller, Ph.D. (Computer Science): 2000-2002
n
Data from: Wide range screening of algorithmic bias in word embedding models...
data.niaid.nih.gov
zenodo.org
zip
Updated Apr 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Rozado (2020). Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types [Dataset]. http://doi.org/10.5061/dryad.rbnzs7h7w
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.rbnzs7h7w
Dataset updated
Apr 7, 2020
Dataset provided by
Otago Polytechnic
Authors
David Rozado
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Concerns about gender bias in word embedding models have captured substantial attention in the algorithmic bias research literature. Other bias types however have received lesser amounts of scrutiny. This work describes a large-scale analysis of sentiment associations in popular word embedding models along the lines of gender and ethnicity but also along the less frequently studied dimensions of socioeconomic status, age, physical appearance, sexual orientation, religious sentiment and political leanings. Consistent with previous scholarly literature, this work has found systemic bias against given names popular among African-Americans in most embedding models examined. Gender bias in embedding models however appears to be multifaceted and often reversed in polarity to what has been regularly reported. Interestingly, using the common operationalization of the term bias in the fairness literature, novel types of so far unreported bias types in word embedding models have also been identified. Specifically, the popular embedding models analyzed here display negative biases against middle and working-class socioeconomic status, male children, senior citizens, plain physical appearance and intellectual phenomena such as Islamic religious faith, non-religiosity and conservative political orientation. Reasons for the paradoxical underreporting of these bias types in the relevant literature are probably manifold but widely held blind spots when searching for algorithmic bias and a lack of widespread technical jargon to unambiguously describe a variety of algorithmic associations could conceivably be playing a role. The causal origins for the multiplicity of loaded associations attached to distinct demographic groups within embedding models are often unclear but the heterogeneity of said associations and their potential multifactorial roots raises doubts about the validity of grouping them all under the umbrella term bias. Richer and more fine-grained terminology as well as a more comprehensive exploration of the bias landscape could help the fairness epistemic community to characterize and neutralize algorithmic discrimination more efficiently.

Methods This data set has collected several popular pre-trained word embedding models.

-Word2vec Skip-Gram trained on Google News corpus (100B tokens) https://code.google.com/archive/p/word2vec/

-Glove trained on Wikipedia 2014 + Gigaword 5 (6B tokens) http://nlp.stanford.edu/data/glove.6B.zip

-Glove trained on 2B tweets Twitter corpus (27B tokens) http://nlp.stanford.edu/data/glove.twitter.27B.zip

-Glove trained on Common Crawl (42B tokens) http://nlp.stanford.edu/data/glove.42B.300d.zip

-Glove trained on Common Crawl (840B tokens) http://nlp.stanford.edu/data/glove.840B.300d.zip

-FastText trained with subword infomation on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens) https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.vec.zip

-Fastext trained with subword infomation on Common Crawl (600B tokens) https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M-subword.zip"
Optum ZIP5 OMOP
redivis.com
application/jsonl +7
Updated Mar 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2021). Optum ZIP5 OMOP [Dataset]. http://doi.org/10.57761/e54r-bg69
Explore at:
application/jsonl, arrow, parquet, spss, stata, avro, csv, sasAvailable download formats
Unique identifier
https://doi.org/10.57761/e54r-bg69
Dataset updated
Mar 3, 2021
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Description
Abstract

Optum ZIP5 v8.0 database in the OMOP data model (https://www.ohdsi.org/data-standardization/the-common-data-model/). This dataset covers 2003-Q1 to 2020-Q2

Section 10

A Condition Era is defined as a span of time when the Person is assumed to have a given condition. Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:

It allows aggregation of chronic conditions that require frequent ongoing care, instead of treating each Condition Occurrence as an independent event.

It allows aggregation of multiple, closely timed doctor visits for the same Condition to avoid double-counting the Condition Occurrences.

%3C!-- --%3E

For example, consider a Person who visits her Primary Care Physician (PCP) and who is referred to a specialist. At a later time, the Person visits the specialist, who confirms the PCP's original diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era.v

Conventions

Condition Era records will be derived from the records in the CONDITION_OCCURRENCE table using a standardized algorithm.

Each Condition Era corresponds to one or many Condition Occurrence records that form a continuous interval.

Condition Eras are built with a Persistence Window of 30 days, meaning, if no occurrence of the same condition_concept_id happens within 30 days of any one occurrence, it will be considered the condition_era_end_date.

%3C!-- --%3E

The text above is taken from the OMOP CDM v5.3 Specification document.

Section 8

The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables. For example, the "Condition" Domain contains Concepts that describe a condition of a patient, and these Concepts can only be stored in the condition_concept_id field of the CONDITION_OCCURRENCE and CONDITION_ERA tables. This reference table is populated with a single record for each Domain and includes a descriptive name for the Domain.

Conventions

There is one record for each Domain. The domains are defined by the tables and fields in the OMOP CDM that can contain Concepts describing all the various aspects of the healthcare experience of a patient.

The domain_id field contains an alphanumerical identifier, that can also be used as the abbreviation of the Domain.

The domain_name field contains the unabbreviated names of the Domain.

Each Domain also has an entry in the Concept table, which is recorded in the domain_concept_id field. This is for purposes of creating a closed Information Model, where all entities in the OMOP CDM are covered by unique Concept.

%3C!-- --%3E

The text above is taken from the OMOP CDM v5.3 Specification document.

Section 12

A Drug Era is defined as a span of time when the Person is assumed to be exposed to a particular active ingredient. A Drug Era is not the same as a Drug Exposure: Exposures are individual records corresponding to the source when Drug was delivered to the Person, while successive periods of Drug Exposures are combined under certain rules to produce continuous Drug Eras.

Conventions

Drug Eras are derived from records in the DRUG_EXPOSURE table using a standardized algorithm.

Each Drug Era corresponds to one or many Drug Exposures that form a continuous interval and contain the same Drug Ingredient (active compound).

The drug_concept_id field only contains Concepts that have the concept_class 'Ingredient'. The Ingredient is derived from the Drug Concepts in the DRUG_EXPOSURE table that are aggregated into the Drug Era record.

The Drug Era Start Date is the start date of the first Drug Exposure.

The Drug Era End Date is the end date of the last Drug Exposure. The End Date of each Drug Exposure is either taken from the field drug_exposure_end_date or, as it is typically not available, inferred using the following rules:

The Gap Days determine how many total drug-free days are observed between all Drug Exposure events that contribute to a DRUG_ERA record. It is assumed that the drugs are "not stockpiled" by the patient, i.e. that if a new drug prescription or refill is observed (a new DRUG_EXPOSURE record is written), the remaining supply from the previous events is abandoned.

The difference between Persistence Window and Gap Days is that the former is the maximum drug-free time allowed between two subsequent DRUG_EXPOSURE records, while the latter is the sum of actual drug-free days for the given Drug Era under the abo
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2018). Stanford University HIV Drug Resistance Database [Dataset]. http://identifiers.org/RRID:SCR_006631

Stanford University HIV Drug Resistance Database

RRID:SCR_006631, nif-0000-21195, Stanford University HIV Drug Resistance Database (RRID:SCR_006631), HIVDB

Explore at:

Unique identifier

https://identifiers.org/RRID:SCR_006631 https://identifiers.org/RRID:SCR_006631/resolver?q=&i=rrid

Dataset updated

Jun 11, 2018

Description

The Stanford University HIV Drug Resistance Database is a curated public database designed to represent, store, and analyze the different forms of data underlying HIVs drug resistance. HIVDB has three main types of content: (1) Database queries and references, (2) Interactive programs, and (3) Educational resources. Database queries are designed primarily for researchers studying HIV drug resistance. The interactive programs and educational resources are designed for both researchers and those wishing to learn more about HIV drug resistance. 1.DATABASE QUERY AND REFERENCE PAGES Genotype-Treatment Correlations This Genotype-Treatment section of the database links to 15 interactive query pages that explore the relationship between treatment with HIV-1 antiretroviral drugs (ARVs) and mutations in HIV reverse transcriptase (RT), protease, and integrase. There are five types of interactive query pages: Treatment Profiles (Protease and RT inhibitors) Mutation Profiles (Protease and RT mutations) Detailed Treatment Queries (Protease, RT, and integrase inhibitors) Detailed Mutation Queries (Protease, RT, and integrase mutations) Mutation Prevalence According to Subtype and Treatment Genotype-Phenotype Correlations The main page of the Genotype-Phenotype Correlations section links to four interactive query pages: three dynamically updated data summaries and one regularly updated downloadable dataset. Drug Resistance Positions Query for levels of resistance associated with known drug resistance mutations Detailed Phenotype Queries Queries for levels of resistance associated with individual mutations or mutation combinations at all positions of protease, RT, and integrase Patterns of Drug Resistance Mutations Downloadable Reference Dataset Genotype-Clinical Correlations This part of the database has two main sections: Clinical Trials Datasets Summaries of Clinical Studies References This part of the database has two main sections: one with summaries of the data from each of the references in HIVDB and one in which every primate immunodeficiency virus sequence in GenBank is annotated according to its presence or absence in HIVDB. Studies in HIVDB GenBank HIVDB New Submissions Approximately every three months, the New Submissions section lists the studies that have been entered into HIVDB. The study title links to the introductory page of the study in the References section. Database Statistics (http://hivdb.stanford.edu/pages/HIVdbStatistics.html) 2. INTERACTIVE PROGRAMS HIVDB has seven main interactive programs. 1. HIVdb Program Mutation List Analysis Sequence Analysis HIVdb Output Sierra Web Service Release Notes Algorithm Specification Interface (ASI) 2. HIValg Program 3. HIVseq Program 4. Calibrated Population Resistance (CPR) tool 5. Mutation ARV Evidence Listing (MARVEL) 6. ART-AiDE 7. Rega HIV-1 Subtyping tool Three programs in the HIV Drug Resistance Database share a common code base: HIVseq, HIVdb, and HIValg. HIVseq accepts user-submitted protease, RT, and integrase sequences, compares them to the consensus subtype B reference sequence, and uses the differences as query parameters for interrogating the HIV Drug Resistance database (Shafer, D Jung, & B Betts, Nat Med 2000; Rhee SY et al. AIDS 2006). The query result provides users with the prevalence of protease, RT and integrase mutations according to subtype and PI, nucleoside RT inhibitor (NRTI), non-nucleoside RT inhibitor (NNRTI), and integrase inhibitor (INI) exposure. This allows users to detect unusual sequence results immediately so that the person doing the sequencing can check the primary sequence output while it is still on the desktop. In addition, unexpected associations between sequences or isolates can be discovered by immediately retrieving data on isolates sharing one or more mutations with the sequence. There are three ways in which the HIVdb program can be used: (i) entering a list of protease and RT mutations, (ii) entering a complete sequence containing protease, RT, and/or integrase, and (iii) using a Web Service. HIVdb is an expert system that accepts user-submitted HIV-1 pol sequences and returns inferred levels of resistance to 20 FDA-approved ARV drugs including 8 PIs, 7 NRTIs, 4 NNRTIs, and - with this update - one INI. In the HIVdb system, each HIV-1 drug resistance mutation is assigned a drug penalty score and a comment; the total score for a drug is derived by adding the scores of each mutation associated with resistance to that drug. Using the total drug score, the program reports one of the following levels of inferred drug resistance: susceptible, potential low-level resistance, low-level resistance, intermediate resistance, and high-level resistance. HIValg is designed for users interested in comparing the results of different algorithms or who are interested in comparing and evaluating existing and newly developed algorithms. The ability to develop new algorithms that can be run on the HIV Drug Resistance Database depends on the Algorithm Specific Interface (ASI) compiler (Shafer & Betts JCM 2003). Submission of Sequences and Mutations For each of the three programs, sequences can be entered using either the Sequence Analysis Form or the Mutation List form. 3. EDUCATIONAL RESOURCES HIVDB contains several regularly updated sections summarizing data linking RT, protease, and integrase mutations and antiretroviral drugs (ARVs). These sections include (i) tabular summaries of the major mutations associated with each ARV class, (ii) detailed summaries of the major, minor, and accessory mutations associated with each ARV, (iii) the comments used by the HIVdb program, (iv) the scores used by the HIVdb program, (v) clinical studies in which baseline drug resistance mutations have been correlated with the virological response (clinical outcome) to a specific ARV, (vi) mutations that can be used for drug resistance surveillance, and (vii) a two-page PDF handout. 1. Drug Resistance Summaries Tabular Drug Resistance Summaries by ARV Class Detailed Drug Resistance Summaries by ARV Drug Resistance Mutation Comments Used by the HIVdb Program Drug Resistance Mutation Scores Used by the HIVdb Program Genotype-Clinical Outcome Correlation Studies 2. Surveillance Drug-Resistance Mutation List Section 3. PDF Handout Grant Support 1. National Institute for Allergy and Infectious Diseases (NIAID, NIH): Online HIV Drug Resistance Database (PI: Robert W. Shafer, MD, 1R01AI68581-01A1), 04/01/06 - 3/31/11 2. National Institute for Allergy and Infectious Diseases (NIAID, NIH) supplement to the grant Identification of Multidrug-Resistant HIV-1 Isolates (PI: Robert W. Shafer, MD, AI46148-01): Supplement provided 1999-2005. 3. NIH/NIGMS Program Project on AIDS Structural Biology Program Project: Targeting Ensembles of Drug Resistant Protease Variants (PI: Celia Schiffer, PhD, University of Massachusetts): 2002-2007 4. University-wide AIDS Research Program (CR03-ST-524). Community collaborative award: Optimizing Clinical HIV Genotypic Resistance Interpretation: Principal Investigators: Robert W. Shafer, MD and W. Jeffrey Fessel MD (Kaiser Permanente Medical Care Program): 2004-2005 5. Stanford University Bio-X Interdisciplinary Initiative: HIV Gene Sequence Analysis for Drug Resistance Studies: A Pharmacogenetic Challenge Principal Investigators: Robert W. Shafer, MD and Daphne Koller, Ph.D. (Computer Science): 2000-2002

Clear search

Close search

Google apps

Main menu

Stanford University HIV Drug Resistance Database

Data from: Wide range screening of algorithmic bias in word embedding models...

Optum ZIP5 OMOP

Abstract

Section 10

Section 8

Section 12

Stanford University HIV Drug Resistance Database

RRID:SCR_006631, nif-0000-21195, Stanford University HIV Drug Resistance Database (RRID:SCR_006631), HIVDB