1. 5236 British Pakistani/British Bangladeshi adults from East London Genes and Health (ELGH)
2. 2624 British South Asian mothers from Born in Bradford (mostly Pakistani) (BiB)
3. 1061 British South Asian adults from Birmingham (mostly Pakistani) (Birm)
All of the Birmingham and most of the Born in Bradford samples were previously sequenced as part of PMID: 26940866.
In the sample list file, the columns of interest to most people will be:
vcf.id - sample ID from the vcf
cohort - which cohort they're in
sex.assigned - sex inferred from coverage on the X and Y chromosomes. Individuals for whom this did not match their reported sex have been discarded
total, chrX and chrY - coverage within bait regions across all chromosomes, chrX and chrY respectively
Mapping was done with bwa-mem and variant calling was carried out with GATK HaplotypeCaller. We removed variant sites for which the following was true: SNPs: "QD < 2.0 || FS > 30 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" Indels: "QD < 2.0 || FS > 30 || ReadPosRankSum < -20.0"
Who controls access to this dataset
For each dataset that requires controlled access, there is a corresponding Data Access Committee (DAC) who determine access permissions. Access to actual data files is not managed by the EGA. If you need to request access to this data set, please contact:
Wellcome Trust Sanger Institute
Contact person: Data Sharing
Email: datasharing [at] sanger [dot] ac [dot] uk
More details: EGAC00001000205