Need Help?

SG10K_Pilot Dataset: Whole genome sequencing data of 4810 individuals from Singapore

Genomic data obtained from the joint processing and variant calling of 4,810 individuals from Singapore. VCF files are by Chromosome (chr. 1-22 plus X) for all 4,810 individuals. Self-reported ethnicity is found in the "Region" column of metadata file.

Request Access

The Singapore National Precision Medicine Programme will be generating whole genome sequencing data for ~10,000 Singaporeans

Data Access Committee for SG10K_Pilot Dataset Version 1.0 Date: 10 October 2019 Authors: Patrick Tan, Neerja Karnani, John Chambers, Sim Xueling Purpose The National Precision Medicine Data Access Committee (NPM DAC) manages the access and usage of the SG10K_Pilot Dataset, deposited at the European Genome-phenome Archive (EGA), which is hosted by EMEL-EBI and CRG, under accession number EGAS00001003875. The SG10K_Pilot Investigators shall remain as the rightful owners of the datasets and reserve the rights to use the data for their own research and collaborations. This document summarises the data access policies and procedures to enable researchers to access and use the genomic data generated in the SG10K_Pilot study. Definitions “APMO” refers to the A*STAR Precision Medicine Office, which is the Program Office housed under A*STAR Biomedical Research Council (BMRC) providing secretariat support to the DAC. “CRG” refers to the Centre for Genomic Regulation “Data Access Committee” (DAC) refers to the committee managing access to the available SG10K_Pilot Datasets “EGA” European Genome-phenome Archive (www.ega-archive.org) “EMBL-EBI” refers to the European Bioinformatics Institute “NPM” refers to the Singapore National Precision Medicine Program “PMSC” refers to the Precision Medicine Steering Committee “SG10K_Pilot Dataset” refers to the Cell dataset (EGAS00001003875) publically available via European Genome-phenome Archive (or EGA) “SG10K_Pilot Investigators” refer to the investigators involved in the generation of the SG10K_Pilot Datasets and the institutional PIs of each cohort contributing to the SG10K_Pilot Datasets   SG10K_Pilot Dataset SG10K_Pilot published in Cell and archived at the EGA (www.ega-archive.org) relates to whole genome sequencing data for 4,810 Singaporeans. All datasets have been pseudonymised, and so considered de-identified as per described in the paper. IRB approval for use of the SG10K_Pilot Dataset IRB approvals are in place to enable the data to be used for biomedical research that is in the public interest. Applications will be judged against these existing IRB approvals to assess whether the proposed use is acceptable and permitted. In exceptional circumstances, a new IRB approval may be needed to enable the work proposed. Data Access Policy 1) Data access requests should be submitted for review by the Data Access Committee, using the SG10K_Pilot Data Access Form. 2) The Committee secretariat will be responsible for correspondence with applicants, co-ordinating meetings, distributing applications for review in advance of meetings, keeping minutes on all discussions, recording outcomes, communicating outcomes to the applicant and liaising with appropriate Data Management Teams. 3) The Committee will maintain an active internal registry of data access requests and monitor progress. A register of approved applications and SG10K_Pilot primary publications will be made publicly available through the study website. 4) In the initial phase, there will be no data access fees. In the future, the PMSC may consider introducing reasonable data access fees to applicants (cost-recovery), to help ensure the sustainability of maintaining the data for use by external researchers. 5) The DAC recognizes that it is impossible for these policies to fully cover all possible publication scenarios and manuscripts. As such, the Committee reserves the right to modify these guidelines for publications on a case-by-case basis, based on the specifics of the proposed data access request. Obligations of Data Access Requestors 1) Applicants must: • Provide evidence that they are bona-fide researchers (including ORCID ID) • Submit a study Data Access Request form (Annex A) and obtain DAC approval before using the data for the research proposed • Ensure data are held securely, and used only by applicants listed on the proposal, or by the researchers employed in the teams that they lead at their respective host institutions. • Use the data only for the experiment described in the application • Submit an application to extend the use if necessary for further review by the DAC. • Agree to make no attempt to re-identify individuals in the study • Provide an annual report on study progress • Destroy the data at the end of the study, or when asked to do so by the DAC. • Acknowledge the contributions of SG10K_Pilot Dataset investigators in all grants and publications arising, using the approved text for methods and acknowledgments. • Agree to any other conditions of use specified by the DAC 2) Applicants agree to refer to the “SG10K_Pilot” Dataset where these data have been used, and to indicate the data freeze used in their manuscript. Applicants also agree to include the SG10K_Pilot acknowledgment statement and reference our Cell paper [DOI: 10.1016/j.cell.2019.09.019] (illustrated below) Acknowledgment statement “We thank the “SG10K_Pilot Investigators” for providing the SG10K_Pilot data (EGAS00001003875) The data from the “SG10K_Pilot Study” reported here were obtained from EGA. This manuscript was not prepared in collaboration with the “SG10K_Pilot Study” and does not necessarily reflect the opinions or views of the “SG10K_Pilot Study”. Publication Wu D, Dou J, Chai X, Bellis C, Wilm A, Shih CC, Soon W, Bertin N, Lin CB, Khor CC, DeGiorgio M, Cheng S, Bao L, Karnani N, Hwang W, Davila S, Tan P, Shabbir A, Moh A, Tan EK, Foo JN, Goh LL, Leong KP, Foo R, Lam C, Richards AM, Cheng CY, Aung T, Wong TY, Ng HH, Liu JJ, Wang C and SG10K Consortium. Large-scale whole-genome sequencing of three diverse Asian populations in Singapore. Cell. 2019 Oct;(e-pub ahead of print – full reference to be provided when available). 3) In situations where non-compliance to these policies are discovered, the DAC reserves the right to block submission of manuscripts, and may consider contacting relevant journals to raise concerns. NPM Data Access Committee: membership and terms of reference John Chambers (Chair, LKC) Sim Xueling (Deputy Chair, NUS/NUHS) Neerja Karnani (Deputy Chair, GUSTO-kids) Cheng Ching-Yu (SingHealth) Shyam Prabhakar (A*STAR) Leong Khai Pang (TTSH) 1. The Data Access Committee (DAC) will review and recommend data access requests to SG10K_Pilot Datasets. 2. The DAC will report to the Precision Medicine Steering Committee, and the Steering Committee may revise the DAC arrangements and membership as circumstances require. 3. The DAC responsibilities will include: • Review applications for access to study data. • Ensure requests are consistent with applicable study data and sample access policies. • Ensure applicants are appropriate researchers. • Ensure that applicants agree to the conditions of data and sample sharing. • Maintain a register of applications, and their outcomes. • Report regularly to the Precision Medicine Steering Committee on the applications process and outcomes. 4. The DAC will require at least 50% of members to be present to be quorate. Approvals should be based on majority voting. In the event of a tie, the Chairman reserves the right to cast the deciding vote. 5. Committee members agree to make best-faith attempts to attend regular meetings related to the evaluation and monitoring of publications, either in-person or by teleconference. It is envisaged that Committee meetings will occur on a quarterly basis. 6. Committee meetings will be coordinated by the Chair or Deputy Chairs. 7. Committee members are not allowed to designate their responsibilities to other individuals, without approval of the overall Committee. 8. Committee meetings will be supported by APMO, and may be attended by observers related to the NPM effort. 9. Committee members agree to declare any conflicts-of-interest (either real or perceived) related to specific data requests. 10. Committee members agree to respond in reasonable time to emails and electronic communications related to the evaluation and monitoring of data access requests. 11. Committee members commit to a timely response to applications for data access. A guideline timeframe is 1 month. In the absence of reply, ‘no objection’ will be inferred. 12. In absence of a clear consensus, the Committee chairman reserves the right to make the final decision. 13. Members agree that Committee decisions will be documented with clear, recorded feedback and justifications. 14. Members agree to maintain compliance to the DAC guidelines and to be unbiased in the review process. 15. Committee members agree to review and decide on potential amendments to the publication guidelines, as needed. 16. Committee members agree to play an active role in resolving conflicts, in accordance with the overall spirit and mission of NPM. 17. Members agree to alert the Committee if they have concerns related to data access requests (eg. non-compliance, data falsification, authorship conflicts, conflicts-of-interest, etc.).   SG10K_Pilot Data Access Form For official use Request received date: Click or tap to enter a date. Request number: Section A: Requestor details A1. Full name: A2. Designation: A3. Department / organisation: A4. Email address: A5. Phone number: A6. ORCID: A7. Date of request: Click or tap to enter a date. Section B: Details of request B1. Title of research study: B2. Proposed start date: Click or tap to enter a date. B3. Proposed end date: Click or tap to enter a date. B4. Key co-investigators (suggest limit to 5-10) Name Institution 1 2 3 4 5 6 7 8 9 10 B5. Does this request relate to any previous data or samples that you have requested from DAC? ☐ No, this is a first request/this has no connection with previous request(s). ☐ Yes. Please state request form reference number and specify how they are related: B6. Main Research Question (~200 words) B7. Specific Aims (~200 words) B8. Background (~500 words) B9. Overview of study design and analysis plan (~500 words) B10. Key references (up to 10) Section C: Acknowledgment by applicant I confirm that the information provided above is true and accurate, and I agree to comply with and be bound by NPM terms and conditions, including: • I agree: i. to ensure data are held securely, used only by the approved, named personnel for the purposes described in the application, ii. to make no attempt to re-identify individuals in the study, and iii. to destroy the data at the end of the study, or when asked to do so by the DAC. • I agree to include the acknowledgement statement provided by the DAC in the acknowledgement sections of manuscripts and cite the related Cell publication in manuscripts. • If I substantially change the proposed use of the data (eg change in analysis theme, change in lead or senior authors), I agree to submit a new data access form. • I agree to have my data request listed on a public web-site. • I agree to respect any specific requests made by the DAC in relation to this application. ______________________________________ Signature of Requestor & Date ______________________________________ Signature of Institution Representative & Date In situations where non-compliance to these policies are discovered, the Committee reserves the right to block submission of manuscripts, and may consider contacting relevant journals to raise concerns. Questions related to publications should be directed to the A*STAR Precision Medicine Office (contact_apmo@hq.a-star.edu.sg).   SECTION D. NPM Data Access Committee - Application Review Form D1. Application received Received date: Click or tap to enter a date. Request number: D2. Discussed at NPM Committee Presented date: Click or tap to enter a date. Presented by: D3. Data Access Committee members present Name: Institution Present John Chambers (Chair) LKC Medicine Sim Xueling (Deputy Chair) NUS/NUHS Neerja Karnani (Deputy Chair) SICS Cheng Ching-Yu SingHealth Shyam Prabhakar A*STAR Leong Khai Pang TTSH D4. Summary of discussion: D5. Decision: Approve / Revise and resubmit / Reject D6. Co-author requirements: D7. Additional comments or requirements for applicants: D8. Draft of outcome circulated within Data Access Committee Circulation date: Click or tap to enter a date. Circulated by: D9. Outcome finalized by Data Access Committee Final decision date: Click or tap to enter a date. Recorded by: D10. Outcome communicated to applicant Communication date: Click or tap to enter a date. Communicated by: D11. Outcome passed to Data Management Team DMT date: Click or tap to enter a date. Communicated by: D12. Comments

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
EGAS00001003875 Other

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Located in
EGAF00002724018 vcf.gz 3.8 GB
EGAF00002724019 vcf.gz 1.4 GB
EGAF00002724020 vcf.gz 1.2 GB
EGAF00002724021 vcf.gz 1.2 GB
EGAF00002724022 vcf.gz 926.9 MB
EGAF00002724023 vcf.gz 1.0 GB
EGAF00002724024 vcf.gz 1.0 GB
EGAF00002724025 vcf.gz 998.8 MB
EGAF00002724026 vcf.gz 624.5 MB
EGAF00002724027 vcf.gz 1.8 GB
EGAF00002724028 vcf.gz 1.5 GB
EGAF00002724029 vcf.gz 1.5 GB
EGAF00002724030 vcf.gz 1.3 GB
EGAF00002724031 vcf.gz 756.9 MB
EGAF00002724032 vcf.gz 689.3 MB
EGAF00002724033 vcf.gz 670.5 MB
EGAF00002724034 vcf.gz 579.7 MB
EGAF00002724035 vcf.gz 591.1 MB
EGAF00002724036 vcf.gz 489.3 MB
EGAF00002724037 vcf.gz 460.9 MB
EGAF00002724038 vcf.gz 300.9 MB
EGAF00002724039 vcf.gz 289.4 MB
EGAF00002724040 vcf.gz 1.6 GB
23 Files (24.7 GB)