Skip to main content

Data Sharing Policy

Date Approved: November 15, 2023

Date of Last Update: November 17, 2023

Version: 2.1

Back to top

Scope

To facilitate access, sharing, and broad usage of datasets with genomic and health measures from diverse ancestry populations within the PRIMED (Polygenic Risk Methods in Diverse Populations) Consortium, the PRIMED Consortium, NHGRI, and NCI have established the Data Sharing Policy (DSP) herein presented. The scope of this policy extends to the expectations and timelines for bringing in existing and/or newly generated data from dbGaP, PRIMED Study Sites and other data contributors. The processes and procedures outlined in the DSP are designed to promote scientific efficiency, synergy, and collaboration by facilitating Consortium-wide cross-study analyses, while protecting study participants’ privacy and respecting consent. 

The DSP addresses intra-Consortium sharing of contributed individual-level data and summary-level data, as well as derived data products generated in PRIMED. Two initial mechanisms for intra-Consortium sharing are described: Coordinated dbGaP Applications and a Consortium Data Sharing Agreement (CDSA). The DSP also addresses appropriate sharing or release of derived data products generated in PRIMED to the scientific community. Redistribution or re-release of controlled access (individual- or summary-level) source data (e.g., basis for the derived data) outside of the PRIMED Consortium or with the external scientific community is not permissible under this version of the DSP. 

It is incumbent upon PRIMED investigators to be aware of and follow additional policies that apply to their activities, whether within or beyond the scope of this PRIMED DSP. Such additional policies include the National Institutes of Health Genomic Data Sharing Policy (GDS), Consortium Guidelines for AnVIL Data Access, the Genomic Data User Code of Conduct, and any other local or institution-specific policies.

Back to top

Policy

Eligibility for Consortium Data Access and Sharing

In order to enter into PRIMED Consortium-wide data sharing circles, an investigator’s name must be included on the PRIMED Eligibility List (EL). Specifically, submitted PRIMED Coordinated dbGaP applications will be disapproved by the NIH DAC(s) if the applicant’s name does not appear on the EL. Similarly, signed CDSAs will not be accepted by the Coordinating Center if the investigator representative (signatory) is not on the EL. PRIMED PIs are listed on the EL by default and should contact the CC to add PRIMED co-Investigators. PRIMED Affiliate Members have the opportunity to enter Consortium data sharing circles; see the Affiliate Membership Policy for details.

The PRIMED Eligibility List includes:

  • Contact PIs and multiple PIs of all Study Sites and the Coordinating Center
  • Co-Investigators of Study Sites and the Coordinating Center, upon request from a contact or multiple PI
  • Affiliate Group PI, upon indication and approval to enter PRIMED Consortium-wide data sharing circles during the Affiliate Membership application process and Steering Committee approval
  • Co-Investigators of Affiliate Groups, upon request from an Affiliate Group PI
  • PIs, multiple PIs, co-investigators of multi-institutional Affiliate Groups (e.g. CDSA DATA AFFILIATE COMPONENT groups), upon request from an Affiliate Group PI (e.g. Primary not COMPONENT PI)

Note that EL members do not gain access to Consortium datasets until they are approved via the relevant data access mechanism (e.g., DACs have approved relevant Data Access Requests on a PRIMED coordinated dbGaP application, or a signed and executed CDSA has been submitted to the Coordinating Center).

PRIMED data is stored, shared, and accessed via the AnVIL platform. All Consortium members must agree to abide by the Consortium Member Responsibilities in the Consortium Guidelines for AnVIL Data Access, including establishing Two Factor Authentication on their Google Account, in order to be granted access. 

Additional eligibility requirements specific to coordinated dbGaP applications are provided under PRIMED Coordinated dbGaP Applications below.

When Eligibility List members change institution and do not retain an appointment at their original (i.e. at the time of joining PRIMED) institution, the Coordinating Center has the role to remove reader/uploader access for a CDSA’s investigator REPRESENTATIVE (Eligibility List member) and their team after a three month grace period.

Mechanisms of Consortium Data Access and Sharing

Due to the volume and heterogeneity of data and data sources used in PRIMED, multiple sharing mechanisms are needed. Two primary mechanisms are described below. Additional coordinated access mechanisms may be added over the life of the Consortium to enable access and sharing of additional controlled-access datasets with requirements that are unique or otherwise unsatisfied by the two primary mechanisms, such as for the UK Biobank (UKBB), eMERGE Network, All of Us Research Program (AoU), and Million Veterans Program (MVP). Note that open/unrestricted access data can be shared within the PRIMED Consortium without a coordinated access mechanism (see Data Management and Access for more information).

Figure 1. PRIMED data sharing circles relationships illustration
Figure 1. PRIMED data sharing circles. Gold and teal indicate primary data sharing circles in PRIMED for controlled-access data: the PRIMED dbGaP and PRIMED-SAG data sharing circles, respectively. Hexagons note the governing body overseeing each circle. The purple circles indicate additional mechanisms of coordinated access beyond the PRIMED dbGaP and PRIMED-SAG circles, e.g. to coordinate access to resources such as UKBB, eMERGE, AoU, and MVP. The green circles indicate that additional Site-specific data sharing circles may exist; such circles are outside the scope of this DSP. 

PRIMED Coordinated dbGaP Applications

PRIMED is using coordinated dbGaP applications to create a sharing circle for controlled-access data released by or otherwise accessible via dbGaP (gold circle in Figure 1). Applications with the same title, Research Use Statement, collaborator list, and other key elements can be used to allow investigators across institutions to share data (i.e. investigators not otherwise eligible to be covered by a single application). Data for a given study-consent group can only be shared among applicants with approved Data Access Requests (DARs) to that given study-consent group. This process was modeled off of a precedent from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. PRIMED investigators should follow the Instructions for PRIMED Coordinated dbGaP Applications to apply. Approval is expected if the PRIMED-specific process is followed completely.

Prior to submitting an application, applicants should review information about the datasets they will be requesting to see if the following are needed. Obtaining this documentation may take time:

  • If requesting datasets that have a consent group that contains an Institutional Review Board (IRB) modifier, applicants must obtain local (i.e. from their institution) IRB approval for their proposed PRIMED analyses. 
  • If requesting datasets that have a consent group that contains a Collaboration Required (-COL) modifier, the applicant must provide a letter of collaboration with the primary study investigator(s). The letter of collaboration must be renewed every year. Refer to Letter of Collaboration templates.

To initiate a dbGaP application, the following prerequisites must be met:

  • The applicant must be eligible to apply to access dbGaP data: dbGaP Authorized Access Portal > click on “Who can apply for access?”
  • The applicant must apply to access dbGaP data: dbGaP Authorized Access Portal > click on “How does one apply?”
  • The applicant’s dbGaP account must be in good standing.

Separate access requests must be filed per institution, even if collaborators are within the same funded Study Site within PRIMED. When a study investigator moves to a new institution, s/he must submit a new request from that institution. Data obtained via application from one institution may not be transferred to another institution. The PRIMED Eligibility List is appended to each PRIMED application to indicate external collaborators (i.e., collaborators at a different institution than the applicant) with whom data can be shared via the coordinated applications - see Data Management and Access below. 

PRIMED Consortium Data Sharing Agreement (CDSA)

The PRIMED Consortium Data Sharing Agreement (CDSA) enables intra-Consortium sharing of studies and datasets provided by PRIMED Study Sites and/or Affiliate Member data contributors. The CDSA therefore creates an additional sharing circle (teal circle in Figure 1) and enables sharing of additional data (e.g., data not available via dbGaP). The PRIMED CDSA was modeled off precedents from the CHARGE and C4R consortia.

PRIMED Core Members do not require Steering Committee approval to join the PRIMED-SAG, either as MEMBERs or DATA AFFILIATES. PRIMED Affiliate Members do require Steering Committee approval to join the PRIMED-SAG, either as DATA AFFILIATES or NON-DATA AFFILIATES. COMPONENTS (associated centers or institutions who wish to use SAG Data) can join the PRIMED-SAG if they are represented by and have a relationship to their respective MEMBER, DATA AFFILIATE, or NON-DATA AFFILIATE. COMPONENTS do not require Steering Committee approval to join the PRIMED-SAG. However, each MEMBER, DATA AFFILIATE, NON-DATA AFFILIATE, and each of their COMPONENTS must submit a copy of the signed and executed Consortium Data Sharing Agreement (CDSA) to the PRIMED CC.

For those interested in signing the CDSA as a DATA AFFILIATE:

  • If the PI who can sign the CDSA on behalf of the study/cohort/consortium is already a PRIMED Core Member, they will not need to apply for Affiliate Membership.
  • If the PI who can sign the CDSA on behalf of the study/cohort/consortium is not already a PRIMED Core Member, they will need to apply for Affiliate Membership through the Data Affiliate application process (see Affiliate Membership Policy).

For those interested in signing the CDSA as a NON-DATA AFFILIATE:

  • If the PI who can sign the CDSA on behalf of a research team/lab is already a PRIMED Core Member, they will not need to apply for Affiliate Membership.
  • If the PI who can sign the CDSA on behalf of the group/research team is not already a PRIMED Core Member, they will need to apply for Affiliate Membership through the Expertise-only (Non-Data Affiliate) application process (see Affiliate Membership Policy).

See also definition of Affiliate Member Applicant in the PRIMED Affiliate Member Policy.

Data Management and Access

PRIMED Consortium data are to be uploaded, stored, and accessed on the NHGRI’s AnVIL cloud platform. Data in the PRIMED Consortium AnVIL workspaces will be accessible only by eligible PRIMED investigators who have been granted secure data access via the PRIMED Mechanisms of Access (i.e., Coordinated dbGaP Applications; signed and executed PRIMED Consortium Data Sharing Agreement). Data access permits data management and analysis in AnVIL workspaces. Downloads of data are not permitted.

Details on the management of PRIMED Consortium data – including data organization in AnVIL open and controlled-access workspaces, management of user access lists, expectations for data formatting, and the process for users to upload, validate, and access data – are described in detail in the PRIMED Data Management and Funding Plan. The plan also describes the distribution of cloud expenses, including data storage and computation, across the PRIMED Study Sites and Coordinating Center.

SAG management

Below, “VN” refers to the current major version of the CDSA, “VN.K” to differing minor versions within the current major version, and “V(N-1)” to the previous major version.

  • The SAG is formed by signatories of the current major version of the CDSA (VN). Differences in minor version (e.g, CDSA VN.K) are allowed in the SAG.
  • A major version becomes current immediately upon approval by the PRIMED Steering Committee.
    • Data from new DATA AFFILIATE signatories can be uploaded and made available upon signing the current major version.
  • When incrementing a major version update of the CDSA (i.e. from V(N-1) to VN), signatories of the previous version have a three month grace period to sign the new version in order to remain in the SAG.
    • Data uploaded during previous version(s) of the CDSA, i.e. V(N-1), remain available to pre-existing and new signatories during the grace period.
    • Data uploaded during the grace period under CDSA VN would become available to pre-existing and new signatories during the grace period. 
    • For all signatory categories, failure to update to the current major version by the end of the grace period would result in removal from the SAG and loss of access to the data available in the SAG.
    • For DATA AFFILIATES, failure to update to the current major version by the end of the grace period would further result in the removal of their data contribution from the SAG. An extension may be granted by the PRIMED Executive Committee if documentation of signature progress can be provided. 
  • There are two routes through which a signatory can update to the current major version of the CDSA (VN):
    • (1) Have previously signed CDSA V(N-1) and sign the V(N-1) to VN Amendment
      • For example when V3.0 is the current version, sign V2.1 and the V3.0 Amendment 
    • (2) Sign CDSA VN (or sign CDSA VN.K)
      • For example when V3.0 is the current version, sign V3.0 or V3.K
  • Future major version updates to the CDSA (beyond V3.0) are not anticipated at this time, given the multiple rounds of institutional feedback and consultation to arrive at V3.0 (and ~1.5 years after initial drafting). Major updates may be considered in the future, at the discretion of the PRIMED Executive Committee.

Data Sharing Guidelines and Expectations

Data Sharing within PRIMED

PRIMED Study Sites will share data with each other using the Consortium data sharing mechanisms defined above, and leverage AnVIL shared workspaces managed by the CC (see also PRIMED Data Management and Funding Plan). Data sharing applies to source data and any data that are derived, harmonized, imputed, or re-processed from those data. Existing summary level data (e.g., GWAS summary statistics, PRS weights, allele frequencies) that can be shared, will be shared within the Consortium in AnVIL. All individual participant or individual-level genotype and phenotype data that can be shared, will be shared within the Consortium in AnVIL. Some datasets may not be shared due to data use limitations (DULs) or restrictions around redistributing the data in AnVIL.

Data upload and sharing within the Consortium should (1) be prompt and supportive of the productivity of the Consortium’s output and (2) in alignment with PRIMED RFA and NIH program requirements and directives.

Data Sharing outside of PRIMED

Data generated within PRIMED will be shared with the broader scientific community whenever possible and in alignment with NIH data sharing policies and study-specific data use limitations. As a Consortium developing PRS methods based largely on secondary use of extant data, data generated by the Consortium is expected to fall into the following general categories:

  1. Summary-level: Association analysis (i.e. GWAS) results
  2. Summary-level: PRS models - i.e. lists of variants in a polygenic risk score, associated weights, and other relevant metadata
  3. Individual-level: harmonized phenotype and genotype data, which is based on pre-existing source (i.e. unharmonized) data 
  4. Individual-level: newly generated/collected phenotype and genotype data may to a lesser extent be generated in PRIMED, e.g. through the supplemental genotyping program

Below are opportunities for sharing these data generated by the Consortium with the broader scientific community, along with caveats.

  1. Newly generated summary-level data (e.g., GWAS summary statistics, allele frequencies, PRS weights, PRS models, other analysis outputs) along with associated documentation will be released outside of the Consortium on a platform/repository and under the access model appropriate for the given datasets:
    1. Summary-level data requiring controlled access may be registered via dbGaP and shared on the AnVIL platform, with specific data use limitations inherited from the source individual-level data.
    2. Open access GWAS statistics may be deposited to the GWAS Catalog.
    3. Open access PRS models may be deposited to the PGS Catalog.
  2. Newly generated individual-level data (e.g. harmonized phenotypes, imputed genotypes) derived from source data accessed via dbGaP applications may be shared outside the Consortium through the existing study accessions for the source studies, with specific data use limitations inherited from the source individual-level data.
    1. This requires obtaining permission from and working closely with the study owners/data generators.
  3. Pre-existing data obtained from publicly available open or controlled access sources will not be re-released and/or redistributed by the PRIMED Consortium.

Note the sharing of Genomic Summary Results (GSR) in PRIMED will follow the 2018 Update to NIH Management of Genomic Summary Results Access. After consulting with the NHGRI Data Sharing Governance Committee, in PRIMED we are considering PRS models to be GSR and thus falling under the NIH GSR sharing policies.

PRIMED investigators with questions on sharing Consortium-generated data products with the broader scientific community should consult the Data Sharing Working Group, Coordinating Center, and/or PRIMED NIH program staff. PRIMED should leverage the opportunity to document potential gaps or lack of clarity in sharing policies, including potential solutions.

Data Use Guidelines

Each Consortium member agrees to comply with all limitations and use restrictions accompanying the PRIMED data or data accessed via the PRIMED Mechanisms of Access described above. Consortium members and other data contributors will convey any such limitations to other members that access these data through PRIMED sharing mechanisms. Limitations should appropriately reflect the informed consent of Study participants from whom the data shared under this agreement were collected and derived. Each data contributor or data generator will provide any limitations (e.g., cardiovascular disease research only; sharing of summary results requires controlled access, etc.) on use of the data they are sharing with the Consortium, and, when in doubt about the appropriateness of sharing, will consult with an ethics board or IRB. As applicable, all data use must be consistent with dbGaP approvals; NIH and PRIMED policies; and participant consents as specified in the “NIH Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing Policy”.

All PRIMED analyses and scientific activities within scope of the PRIMED Publications Policy require an approved proposal. Data obtained via the PRIMED Mechanisms of Access described above should therefore only be used under the auspices of an approved proposal.

dbGaP data

When study investigators apply for access to controlled access data (e.g. dbGaP), they will certify that all data uses will comply with the Data Use Limitations specified for each study and consent group. Data from an individual with a disease-specific consent will not be used in analyses outside of that restriction, unless specifically allowed by the Data Use Limitations.

PRIMED-SAG Data (via CDSA)

Additionally, any PRIMED Consortium member (or group of members) who wishes to make use of PRIMED-SAG Data would indicate the proposed studies/cohorts/consortia (DATA AFFILIATES who signed the PRIMED Consortium Data Sharing Agreement) when submitting a PRIMED paper proposal. The study/cohort/consortium contact (if they opt in) would get an email to review the paper proposal (approve/disapprove and leave comments regarding the use of their study’s data). This optional measure gives DATA AFFILIATES oversight over how their data is used within the PRIMED-SAG.

In order for CDSA COMPONENTS to obtain access/upload permissions to PRIMED-SAG Data, the Primary MEMBER or AFFILIATE to which the center is associated will first need to obtain such access/upload permissions to PRIMED-SAG Data. In other words, the Primary MEMBER, DATA AFFILIATE, or NON-DATA AFFILIATE will need to first join the PRIMED-SAG (via a fully executed PRIMED Consortium Data Sharing Agreement) before its respective MEMBER COMPONENT, DATA AFFILIATE COMPONENT, or NON-DATA AFFILIATE COMPONENT can join the PRIMED-SAG. Similarly, if a PRIMARY MEMBER or AFFILIATE withdraws from the PRIMED-SAG, any associated COMPONENTS will no longer be active (see also Term 7 of CDSA). Further, DATA AFFILIATE COMPONENT uploaders require approval from the Primary DATA AFFILIATE’s Investigator REPRESENTATIVE.

Back to top Back to top

List of Abbreviations

AnVIL: Genomic Data Science Analysis, Visualization, and Informatics Lab-space

CC: PRIMED Coordinating Center

CDSA: Consortium Data Sharing Agreement

DAC: Data Access Committee

DAR: Data Access Request

dbGaP: The Database of Genotypes and Phenotypes

DSP: Data Sharing Policy

DUL: Data Use Limitations

EL: PRIMED Eligibility List

GDS: Genomic Data Sharing

GSR: Genomic Summary Results

GWAS: Genome-Wide Association Studies

IRB: Institutional Review Board

NCI: National Cancer Institute

NHGRI: National Human Genome Research Institute

NIH: National Institutes of Health

PRIMED: Polygenic Risk Methods in Diverse Populations

PRS: Polygenic Risk Score

Back to top

Change Log

  • V2.1 - Minor update to Figure 1 to depict the relationship of this DSP to the data sharing circles.
  • V2.0 - Major policy update, approved on the November 15 Steering Committee call
    • Clarifying management of the SAG, specifically relating to CDSA major version updates
    • Adding determination from the NHGRI Data Sharing Governance Committee that PRS models are GSR
  • V1.0 - Initial Policy, approved on the August 16, 2023 Steering Committee call.
Back to top