Need Help?

Synthetic data - Genome in a Bottle

In May, the National Institute of Standards and Technology (NIST) released its first genome in a bottle, a reference sample of DNA for validating human genome sequences. This so-called truth sequence comes from a decades-old sample donated by a Utah woman for (other) research purposes (NA12878 cell line), which, over the years, has been one of the most studied, and hence best-characterized, human samples. Seeing genomic medicine moving toward mainstream healthcare, researchers at NIST recognized the need for a reference human genome and assembled a private-public consortium in 2012 to create one. As detailed in a 2014 Nature Biotechnology paper (Nat. Biotechnol.32, 246–251, 2014), the group integrated and arbitrated among sequences from 14 data sets, five sequencing technologies, seven read mappers and three variant callers.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001008095 3
EGAD00001008096 Illumina HiSeq 2500 3
EGAD00001008097 3
Publications Citations
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.
Nat Biotechnol 32: 2014 246-251
Extensive sequencing of seven human genomes to characterize benchmark reference materials.
Sci Data 3: 2016 160025
Best practices for benchmarking germline small-variant calls in human genomes.
Nat Biotechnol 37: 2019 555-560
An open resource for accurately benchmarking small variant and reference calls.
Nat Biotechnol 37: 2019 561-566