Synthetic data - Genome in a Bottle
In May, the National Institute of Standards and Technology (NIST) released its first genome in a bottle, a reference sample of DNA for validating human genome sequences. This so-called truth sequence comes from a decades-old sample donated by a Utah woman for (other) research purposes (NA12878 cell line), which, over the years, has been one of the most studied, and hence best-characterized, human samples. Seeing genomic medicine moving toward mainstream healthcare, researchers at NIST recognized the need for a reference human genome and assembled a private-public consortium in 2012 to create one. As detailed in a 2014 Nature Biotechnology paper (Nat. Biotechnol.32, 246–251, 2014), the group integrated and arbitrated among sequences from 14 data sets, five sequencing technologies, seven read mappers and three variant callers.
- Type: Other
- Archiver: European Genome-Phenome Archive (EGA)
Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data
Dataset ID | Description | Technology | Samples |
---|---|---|---|
EGAD00001008095 | 3 | ||
EGAD00001008096 | Illumina HiSeq 2500 | 3 | |
EGAD00001008097 | 3 |
Publications | Citations |
---|---|
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls.
Nat Biotechnol 32: 2014 246-251 |
432 |
Extensive sequencing of seven human genomes to characterize benchmark reference materials.
Sci Data 3: 2016 160025 |
362 |
Best practices for benchmarking germline small-variant calls in human genomes.
Nat Biotechnol 37: 2019 555-560 |
165 |
An open resource for accurately benchmarking small variant and reference calls.
Nat Biotechnol 37: 2019 561-566 |
162 |