Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Cost to access
Described as free to access or have a license that allows redistribution.
17 datasets found
  1. Challenging Medically-Relevant Genes Benchmark Set

    Updated Sep 29, 2021
  2. lra-supplemental-HG002-SV.vcf.tar.gz

    Updated Nov 15, 2020
  3. z

    Assembly of human HG002 (GM24385) ONT Q20+ Simplex dataset generated by...

    Updated Nov 26, 2021
  4. z

    The SV callsets of the HG002 human sample produced by cuteSV with multi...

    Updated Oct 9, 2019
  5. f

    Performance of deletion calls for HG002.

    Updated Mar 27, 2020
  6. z

    PopDel identifies medium-size deletions jointly in tens of thousands of...

    Updated Aug 20, 2020
  7. A public-private-academic consortium hosted by NIST to develop reference...

    Updated 2015
  8. Heuristics used to determine HG002 genotypes.

    Updated Jul 1, 2020
  9. g

    Supporting data for "xAtlas: Scalable small variant calling across...

    Updated Nov 14, 2022
  10. o

    Open Genomes Telomere-to-Telomere (T2T) Reference Realignment Project

    Updated Jan 7, 2019
  11. z

    Minigraph pangenome graphs for HPRC year-1 samples

    gz, log, txt
    Updated Feb 25, 2022
  12. z

    Minigraph pangenome graphs for HPRC year-1 samples

    gz, log, tgz, txt
    Updated Apr 27, 2022
  13. Additional file 3: of Comprehensive evaluation of structural variation...

    Updated Jun 4, 2019
  14. z

    Sample graphs and sequences for testing sequence-to-graph alignment

    agc, gz, log
    Updated Feb 5, 2022
  15. f

    Table_1_stLFRsv: A Germline Structural Variant Analysis Pipeline Using...

    Updated Mar 18, 2021
  16. f

    Additional file 1 of ECNano: A cost-effective workflow for target enrichment...

    Updated Mar 5, 2022
  17. z

    Data from: SVXplorer: three-tier approach to identification of structural...

    Updated Feb 3, 2020
  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Click to copy link
Link copied
National Institute of Standards and Technology (2021). Challenging Medically-Relevant Genes Benchmark Set [Dataset].
Organization logo

Challenging Medically-Relevant Genes Benchmark Set

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 29, 2021
Dataset provided by
National Institute of Standards and Technology


CMRG v1.00 of a small variant benchmark and structural variant benchmark focused on 273 challenging medically relevant genes for the Genome in a Bottle (GIAB) sample HG002 (aka Ashkenazi son). These benchmarks were generated from a trio-based hifiasm v0.11 ( diploid assembly of HG002 using PacBio HiFi reads for HG002 for assembly and partitioning into phased haplotypes using Illumina reads for the parents, HG003 and HG004. This benchmark contains vcfs for small and structural variants along with corresponding benchmark bed files indicating regions that are homozygous reference if they do not have a variant in the vcf. We extensively curated the variant calls, excluding any found to be questionable or errors. This benchmark helps measure performance in important challenging regions, including challenging segmental duplications, regions with complex variants, regions with structural variants, and regions affected by false duplications in GRCh37 or GRCh38. This benchmark is described in

Clear search
Close search
Google apps
Main menu