4 datasets found
  1. Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)

    • redivis.com
    application/jsonl +7
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Doerr School of Sustainability Data Repository (2024). Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [Dataset]. http://doi.org/10.57761/gk3g-wc33
    Explore at:
    stata, csv, application/jsonl, arrow, parquet, sas, spss, avroAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Doerr School of Sustainability Data Repository
    Time period covered
    Jun 27, 2024
    Description

    Abstract

    S3DIS comprises 6 colored 3D point clouds from 6 large-scale indoor areas, along with semantic instance annotations for 12 object categories (wall, floor, ceiling, beam, column, window, door, sofa, desk, chair, bookcase, and board).

    Methodology

    The Stanford Large-Scale 3D Indoor Spaces (S3DIS) dataset is composed of the colored 3D point clouds of six large-scale indoor areas from three different buildings, each covering approximately 935, 965, 450, 1700, 870, and 1100 square meters (total of 6020 square meters). These areas show diverse properties in architectural style and appearance and include mainly office areas, educational and exhibition spaces, and conference rooms, personal offices, restrooms, open spaces, lobbies, stairways, and hallways are commonly found therein. The entire point clouds are automatically generated without any manual intervention using the Matterport scanner. The dataset also includes semantic instance annotations on the point clouds for 12 semantic elements, which are structural elements (ceiling, floor, wall, beam, column, window, and door) and commonly found items and furniture (table, chair, sofa, bookcase, and board).

    https://redivis.com/fileUploads/5bdaf09c-7d3b-4a91-b192-d98a0f0b0018%3E" alt="S3DIS.png">

    %3Cu%3E%3Cstrong%3EImportant Information%3C/strong%3E%3C/u%3E

    %3C!-- --%3E

  2. d

    Data from: Wide range screening of algorithmic bias in word embedding models...

    • datadryad.org
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Rozado (2020). Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types [Dataset]. http://doi.org/10.5061/dryad.rbnzs7h7w
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 7, 2020
    Dataset provided by
    Dryad
    Authors
    David Rozado
    Time period covered
    Mar 22, 2020
    Description

    This data set has collected several popular pre-trained word embedding models.

    -Word2vec Skip-Gram trained on Google News corpus (100B tokens) https://code.google.com/archive/p/word2vec/

    -Glove trained on Wikipedia 2014 + Gigaword 5 (6B tokens) http://nlp.stanford.edu/data/glove.6B.zip

    -Glove trained on 2B tweets Twitter corpus (27B tokens) http://nlp.stanford.edu/data/glove.twitter.27B.zip

    -Glove trained on Common Crawl (42B tokens) http://nlp.stanford.edu/data/glove.42B.300d.zip

    -Glove trained on Common Crawl (840B tokens) http://nlp.stanford.edu/data/glove.840B.300d.zip

    -FastText trained with subword infomation on Wikipedia 2017, UMBC webbase corpus and statmt.org news dataset (16B tokens) https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.vec.zip

    -Fastext trained with subword infomation on Common Crawl (600B tokens) https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M-subword.zip"

  3. n

    Stanford University HIV Drug Resistance Database

    • neuinfo.org
    • scicrunch.org
    • +1more
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Stanford University HIV Drug Resistance Database [Dataset]. http://identifiers.org/RRID:SCR_006631
    Explore at:
    Dataset updated
    Oct 23, 2025
    Description

    The Stanford University HIV Drug Resistance Database is a curated public database designed to represent, store, and analyze the different forms of data underlying HIVs drug resistance. HIVDB has three main types of content: (1) Database queries and references, (2) Interactive programs, and (3) Educational resources. Database queries are designed primarily for researchers studying HIV drug resistance. The interactive programs and educational resources are designed for both researchers and those wishing to learn more about HIV drug resistance. 1.DATABASE QUERY AND REFERENCE PAGES Genotype-Treatment Correlations This Genotype-Treatment section of the database links to 15 interactive query pages that explore the relationship between treatment with HIV-1 antiretroviral drugs (ARVs) and mutations in HIV reverse transcriptase (RT), protease, and integrase. There are five types of interactive query pages: Treatment Profiles (Protease and RT inhibitors) Mutation Profiles (Protease and RT mutations) Detailed Treatment Queries (Protease, RT, and integrase inhibitors) Detailed Mutation Queries (Protease, RT, and integrase mutations) Mutation Prevalence According to Subtype and Treatment Genotype-Phenotype Correlations The main page of the Genotype-Phenotype Correlations section links to four interactive query pages: three dynamically updated data summaries and one regularly updated downloadable dataset. Drug Resistance Positions Query for levels of resistance associated with known drug resistance mutations Detailed Phenotype Queries Queries for levels of resistance associated with individual mutations or mutation combinations at all positions of protease, RT, and integrase Patterns of Drug Resistance Mutations Downloadable Reference Dataset Genotype-Clinical Correlations This part of the database has two main sections: Clinical Trials Datasets Summaries of Clinical Studies References This part of the database has two main sections: one with summaries of the data from each of the references in HIVDB and one in which every primate immunodeficiency virus sequence in GenBank is annotated according to its presence or absence in HIVDB. Studies in HIVDB GenBank HIVDB New Submissions Approximately every three months, the New Submissions section lists the studies that have been entered into HIVDB. The study title links to the introductory page of the study in the References section. Database Statistics (http://hivdb.stanford.edu/pages/HIVdbStatistics.html) 2. INTERACTIVE PROGRAMS HIVDB has seven main interactive programs. 1. HIVdb Program Mutation List Analysis Sequence Analysis HIVdb Output Sierra Web Service Release Notes Algorithm Specification Interface (ASI) 2. HIValg Program 3. HIVseq Program 4. Calibrated Population Resistance (CPR) tool 5. Mutation ARV Evidence Listing (MARVEL) 6. ART-AiDE 7. Rega HIV-1 Subtyping tool Three programs in the HIV Drug Resistance Database share a common code base: HIVseq, HIVdb, and HIValg. HIVseq accepts user-submitted protease, RT, and integrase sequences, compares them to the consensus subtype B reference sequence, and uses the differences as query parameters for interrogating the HIV Drug Resistance database (Shafer, D Jung, & B Betts, Nat Med 2000; Rhee SY et al. AIDS 2006). The query result provides users with the prevalence of protease, RT and integrase mutations according to subtype and PI, nucleoside RT inhibitor (NRTI), non-nucleoside RT inhibitor (NNRTI), and integrase inhibitor (INI) exposure. This allows users to detect unusual sequence results immediately so that the person doing the sequencing can check the primary sequence output while it is still on the desktop. In addition, unexpected associations between sequences or isolates can be discovered by immediately retrieving data on isolates sharing one or more mutations with the sequence. There are three ways in which the HIVdb program can be used: (i) entering a list of protease and RT mutations, (ii) entering a complete sequence containing protease, RT, and/or integrase, and (iii) using a Web Service. HIVdb is an expert system that accepts user-submitted HIV-1 pol sequences and returns inferred levels of resistance to 20 FDA-approved ARV drugs including 8 PIs, 7 NRTIs, 4 NNRTIs, and - with this update - one INI. In the HIVdb system, each HIV-1 drug resistance mutation is assigned a drug penalty score and a comment; the total score for a drug is derived by adding the scores of each mutation associated with resistance to that drug. Using the total drug score, the program reports one of the following levels of inferred drug resistance: susceptible, potential low-level resistance, low-level resistance, intermediate resistance, and high-level resistance. HIValg is designed for users interested in comparing the results of different algorithms or who are interested in comparing and evaluating existing and newly developed algorithms. The ability to develop new algorithms that can be run on the HIV Drug Resistance Database depends on the Algorithm Specific Interface (ASI) compiler (Shafer & Betts JCM 2003). Submission of Sequences and Mutations For each of the three programs, sequences can be entered using either the Sequence Analysis Form or the Mutation List form. 3. EDUCATIONAL RESOURCES HIVDB contains several regularly updated sections summarizing data linking RT, protease, and integrase mutations and antiretroviral drugs (ARVs). These sections include (i) tabular summaries of the major mutations associated with each ARV class, (ii) detailed summaries of the major, minor, and accessory mutations associated with each ARV, (iii) the comments used by the HIVdb program, (iv) the scores used by the HIVdb program, (v) clinical studies in which baseline drug resistance mutations have been correlated with the virological response (clinical outcome) to a specific ARV, (vi) mutations that can be used for drug resistance surveillance, and (vii) a two-page PDF handout. 1. Drug Resistance Summaries Tabular Drug Resistance Summaries by ARV Class Detailed Drug Resistance Summaries by ARV Drug Resistance Mutation Comments Used by the HIVdb Program Drug Resistance Mutation Scores Used by the HIVdb Program Genotype-Clinical Outcome Correlation Studies 2. Surveillance Drug-Resistance Mutation List Section 3. PDF Handout Grant Support 1. National Institute for Allergy and Infectious Diseases (NIAID, NIH): Online HIV Drug Resistance Database (PI: Robert W. Shafer, MD, 1R01AI68581-01A1), 04/01/06 - 3/31/11 2. National Institute for Allergy and Infectious Diseases (NIAID, NIH) supplement to the grant Identification of Multidrug-Resistant HIV-1 Isolates (PI: Robert W. Shafer, MD, AI46148-01): Supplement provided 1999-2005. 3. NIH/NIGMS Program Project on AIDS Structural Biology Program Project: Targeting Ensembles of Drug Resistant Protease Variants (PI: Celia Schiffer, PhD, University of Massachusetts): 2002-2007 4. University-wide AIDS Research Program (CR03-ST-524). Community collaborative award: Optimizing Clinical HIV Genotypic Resistance Interpretation: Principal Investigators: Robert W. Shafer, MD and W. Jeffrey Fessel MD (Kaiser Permanente Medical Care Program): 2004-2005 5. Stanford University Bio-X Interdisciplinary Initiative: HIV Gene Sequence Analysis for Drug Resistance Studies: A Pharmacogenetic Challenge Principal Investigators: Robert W. Shafer, MD and Daphne Koller, Ph.D. (Computer Science): 2000-2002

  4. Optum ZIP5 OMOP

    • redivis.com
    application/jsonl +7
    Updated Mar 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2021). Optum ZIP5 OMOP [Dataset]. http://doi.org/10.57761/e54r-bg69
    Explore at:
    csv, avro, sas, spss, arrow, parquet, application/jsonl, stataAvailable download formats
    Dataset updated
    Mar 3, 2021
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Description

    Abstract

    Optum ZIP5 v8.0 database in the OMOP data model (https://www.ohdsi.org/data-standardization/the-common-data-model/). This dataset covers 2003-Q1 to 2020-Q2

    Section 10

    A Condition Era is defined as a span of time when the Person is assumed to have a given condition. Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:

    • It allows aggregation of chronic conditions that require frequent ongoing care, instead of treating each Condition Occurrence as an independent event.
    • It allows aggregation of multiple, closely timed doctor visits for the same Condition to avoid double-counting the Condition Occurrences.

    %3C!-- --%3E

    For example, consider a Person who visits her Primary Care Physician (PCP) and who is referred to a specialist. At a later time, the Person visits the specialist, who confirms the PCP's original diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era.v

    Conventions

    • Condition Era records will be derived from the records in the CONDITION_OCCURRENCE table using a standardized algorithm.
    • Each Condition Era corresponds to one or many Condition Occurrence records that form a continuous interval.
    • Condition Eras are built with a Persistence Window of 30 days, meaning, if no occurrence of the same condition_concept_id happens within 30 days of any one occurrence, it will be considered the condition_era_end_date.

    %3C!-- --%3E

    The text above is taken from the OMOP CDM v5.3 Specification document.

    Section 8

    The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables. For example, the "Condition" Domain contains Concepts that describe a condition of a patient, and these Concepts can only be stored in the condition_concept_id field of the CONDITION_OCCURRENCE and CONDITION_ERA tables. This reference table is populated with a single record for each Domain and includes a descriptive name for the Domain.

    Conventions

    • There is one record for each Domain. The domains are defined by the tables and fields in the OMOP CDM that can contain Concepts describing all the various aspects of the healthcare experience of a patient.
    • The domain_id field contains an alphanumerical identifier, that can also be used as the abbreviation of the Domain.
    • The domain_name field contains the unabbreviated names of the Domain.
    • Each Domain also has an entry in the Concept table, which is recorded in the domain_concept_id field. This is for purposes of creating a closed Information Model, where all entities in the OMOP CDM are covered by unique Concept.

    %3C!-- --%3E

    The text above is taken from the OMOP CDM v5.3 Specification document.

    Section 12

    A Drug Era is defined as a span of time when the Person is assumed to be exposed to a particular active ingredient. A Drug Era is not the same as a Drug Exposure: Exposures are individual records corresponding to the source when Drug was delivered to the Person, while successive periods of Drug Exposures are combined under certain rules to produce continuous Drug Eras.

    Conventions

    • Drug Eras are derived from records in the DRUG_EXPOSURE table using a standardized algorithm.
    • Each Drug Era corresponds to one or many Drug Exposures that form a continuous interval and contain the same Drug Ingredient (active compound).
    • The drug_concept_id field only contains Concepts that have the concept_class 'Ingredient'. The Ingredient is derived from the Drug Concepts in the DRUG_EXPOSURE table that are aggregated into the Drug Era record.
    • The Drug Era Start Date is the start date of the first Drug Exposure.
    • The Drug Era End Date is the end date of the last Drug Exposure. The End Date of each Drug Exposure is either taken from the field drug_exposure_end_date or, as it is typically not available, inferred using the following rules:
    • The Gap Days determine how many total drug-free days are observed between all Drug Exposure events that contribute to a DRUG_ERA record. It is assumed that the drugs are "not stockpiled" by the patient, i.e. that if a new drug prescription or refill is observed (a new DRUG_EXPOSURE record is written), the remaining supply from the previous events is abandoned.
    • The difference between Persistence Window and Gap Days is that the former is the maximum drug-free time allowed between two subsequent DRUG_EXPOSURE records, while the latter is the sum of actual drug-free days for the given Drug Era under the abo
  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stanford Doerr School of Sustainability Data Repository (2024). Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [Dataset]. http://doi.org/10.57761/gk3g-wc33
Organization logo

Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)

Explore at:
97 scholarly articles cite this dataset (View in Google Scholar)
stata, csv, application/jsonl, arrow, parquet, sas, spss, avroAvailable download formats
Dataset updated
Jun 28, 2024
Dataset provided by
Redivis Inc.
Authors
Stanford Doerr School of Sustainability Data Repository
Time period covered
Jun 27, 2024
Description

Abstract

S3DIS comprises 6 colored 3D point clouds from 6 large-scale indoor areas, along with semantic instance annotations for 12 object categories (wall, floor, ceiling, beam, column, window, door, sofa, desk, chair, bookcase, and board).

Methodology

The Stanford Large-Scale 3D Indoor Spaces (S3DIS) dataset is composed of the colored 3D point clouds of six large-scale indoor areas from three different buildings, each covering approximately 935, 965, 450, 1700, 870, and 1100 square meters (total of 6020 square meters). These areas show diverse properties in architectural style and appearance and include mainly office areas, educational and exhibition spaces, and conference rooms, personal offices, restrooms, open spaces, lobbies, stairways, and hallways are commonly found therein. The entire point clouds are automatically generated without any manual intervention using the Matterport scanner. The dataset also includes semantic instance annotations on the point clouds for 12 semantic elements, which are structural elements (ceiling, floor, wall, beam, column, window, and door) and commonly found items and furniture (table, chair, sofa, bookcase, and board).

https://redivis.com/fileUploads/5bdaf09c-7d3b-4a91-b192-d98a0f0b0018%3E" alt="S3DIS.png">

%3Cu%3E%3Cstrong%3EImportant Information%3C/strong%3E%3C/u%3E

%3C!-- --%3E

Search
Clear search
Close search
Google apps
Main menu