In November 2023, Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium investigators and the Coordinating Center were informed that the NHGRI leadership for the AnVIL Program had recommended that AnVIL not host Human Genome Diversity Project (HGDP) data due to ethical considerations. While the platform is not hosting a copy for the public, use of HGDP data within a private AnVIL workspace is at the discretion of each individual user. This prompted a reassessment of whether PRIMED should use HGDP data, as the consortium had initially planned to use HGDP samples in reference panels for genetic similarity estimation. Multiple PRIMED sites have already leveraged HGDP data as a reference dataset. Over the following months, various PRIMED groups, including the Social and Ethical Implications of Polygenic Risk, Methods Development, and Genotype Harmonization Working Groups, the Ancestry Sub-Working Group, the Steering Committee (SC), and at the Spring 2024 in-person meeting Principal Investigator (PI) session, engaged in discussions focused on the ethical and analytical implications of using HGDP data. On May 15, 2024, the SC voted on and approved the following motion:
Until further notice, PRIMED investigators may include HGDP data as genotype reference panels, for PRIMED-related work.
In making this decision, PRIMED PIs carefully weighed both ethical and scientific considerations. These included issues about the collection and use of HGDP data, particularly the failure to obtain informed consent consistent with current standards from many participants and the insufficient community engagement by the research team. PRIMED sites have primarily used HGDP samples in reference panels for analysis such as genetic similarity estimation. These samples were viewed as important as they include populations not represented in the other panels like the 1000 Genomes Project (1000G), and exclusion could negatively impact Polygenic Risk Score (PRS) performance in specific populations. Balancing the historical context of data collection practices with the current need for comprehensive genetic reference panels posed a significant challenge.
All seven PRIMED sites and the Coordinating Center voted to approve the motion. However, before the formal vote took place, there were differences in opinion during preceding discussions. Over the course of several meetings, a plurality of viewpoints, including dissenting ones, were expressed among PIs. Also, due to concerns about data collection practices and the lack of community engagement, the NIH program team representatives registered a disapproval vote. The NIH team indicated the disapproval was for future use of HGDP and there was not an expectation to extract HGDP data from completed projects.
The HGDP, launched in 1991, aims to study genetic diversity across human populations and now comprises 948 samples (cultured lymphoblastoid cell lines and DNA) from 54 populations defined by geographic origin1,2. The SC recognizes ethical concerns about the collection and use of HGDP data, including the failure to obtain informed consent consistent with current standards from many participants and insufficient community engagement by the research team. These issues have led to perceptions of exploitation by researchers of European descent, potentially harming continued engagement moving forward3.
PRIMED sites have primarily used HGDP samples in reference panels for genetic similarity estimation, genotype imputation, and measures of linkage disequilibrium. The Colorado Center for Personalized Medicine Biobank and the Mass General Brigham Biobank both use a reference panel that includes HGDP samples. HGDP includes samples collected from populations which are not represented in the 1000G, including those from the Middle East, Oceania, and multiple regions in Africa, and from populations in the Americas that have not undergone recent admixture, unlike those in 1000G. PRIMED investigators expressed concern that excluding HGDP samples could lead to inaccurate genetic ancestry inference in specific populations and negatively impact polygenic risk score performance and prediction.
In response to ongoing discussions about HGDP sample usage in PRIMED projects, the Methods WG has initiated a project comparing genetic similarity estimation performance using different reference panels, including those with and without HGDP samples. This project aims to determine whether including HGDP samples impacts local and global genetic similarity estimates.
As ethical considerations and ongoing evaluations of HGDP sample usage in PRIMED projects continue, the consortium will maintain ongoing discussions and may re-evaluate the use of HGDP samples in the future.
- PRIMED Steering Committee, approved August 21, 2024
References
- Cavalli-Sforza, L. L. (2005). The Human Genome Diversity Project: past, present and future. Nature Reviews Genetics, 6(4), 333-340. https://doi.org/10.1038/nrg1596
- Koenig, Z., Yohannes, M. T., Nkambule, L. L., Zhao, X., Goodrich, J. K., Kim, H. A., Wilson, M. W., Tiao, G., Hao, S. P., Sahakian, N., Chao, K. R., Walker, M. A., Lyu, Y., Rehm, H. L., Neale, B. M., Talkowski, M. E., Daly, M. J., Brand, H., Karczewski, K. J., . . . Martin, A. R. (2024). A harmonized public resource of deeply sequenced diverse human genomes. Genome Research, 34(5), 796-809. https://doi.org/10.1101/gr.278378.123
- Greely, H. T. (2001). Human genome diversity: What about the other human genome project? Nature Reviews Genetics, 2(3), 222-227. https://doi.org/10.1038/35056071