Back to blog
Polygenic Risk Score Mendelian Randomization Genetic Epidemiology Causal Inference GWAS

Genomic Research for Clinical Researchers: Leveraging GWAS

A practical guide for clinical researchers on leveraging GWAS results through polygenic risk scores (PRS) and Mendelian randomization (MR) for disease prediction and causal inference.

Applied Research Expanding from GWAS: PRS and MR

More than 20 years have passed since genome-wide association studies (GWAS) were first reported, and they have now become a widely adopted method not only among basic researchers but also among clinical researchers. The essence of GWAS is simple: it is a brute-force, exhaustive approach that searches comprehensively across genetic variants throughout the genome to explore associations with diseases and traits. However, conducting GWAS requires substantial human and financial resources, from participant recruitment to sample processing, genotyping, and large-scale data analysis. For this reason, many consortia and biobanks have prioritized GWAS from the time of their establishment and have moved quickly to perform analyses once data are collected. In contrast, opportunities for clinical researchers and epidemiologists to conduct GWAS themselves remain limited in practice.

However, for clinical researchers and epidemiologists, GWAS is not merely a method for “looking at and understanding the results.” The vast amount of information generated by GWAS has opened up a field of secondary analyses in which clinical knowledge can be applied to great effect. In this article, we first provide an overview of GWAS, and then discuss two applied methods—polygenic risk scores (PRS) and Mendelian randomization (MR)—and their potential uses in clinical research.

GWAS Basics

A basic GWAS is a method that comprehensively examines associations between diseases or traits and approximately one million SNPs on the autosomes obtained using genotyping arrays. Some studies also include sex chromosomes, but mitochondrial DNA is rarely incorporated into the analysis. In many studies, this is followed by imputation, a procedure that estimates unobserved variants using a reference panel, allowing association analyses to be conducted on tens of millions of variants, including SNPs and some indels. Furthermore, by using whole-genome sequencing (WGS), the scope of analysis can be extended to include low-frequency variants and structural variants such as copy number variations (CNVs). Although the properties of the target variants and the statistical models differ, the core concept of GWAS—testing across the entire genome—remains the same. The GWAS summary statistics obtained in this way serve as the starting point for secondary analyses collectively referred to as “post-GWAS.” PRS and MR are representative examples, mainly using GWAS results for autosomal SNPs to perform disease risk prediction and causal inference.

PRS ─ Risk Prediction

PRS is an index of disease risk estimated from an individual’s genomic information. The idea is simple: for example, if SNP1 confers a 1.1-fold increase in risk and SNP2 a 0.9-fold risk, then once the effect size of each SNP (odds ratio or β coefficient) has been estimated, an individual’s overall risk can be calculated by summing these risks according to that individual’s genotype. The effect sizes used here are obtained from previously published GWAS summary statistics. Therefore, only two things are needed to calculate PRS: (1) GWAS summary statistics for the disease of interest, and (2) genomic data from the population in which risk is to be estimated. Because summary statistics do not contain individual-level data, they are often made freely available online by large consortia and biobanks around the world, making them readily accessible even to clinical researchers.

Although PRS may seem similar to family history, which is commonly used in epidemiologic research, the two differ in the following respects. Because of these differences, there has been active debate over which is more useful, PRS or family history. However, the indicators needed vary depending on the clinical and public health context, and approaches that use the two in a complementary manner are currently attracting attention.

PRS vs. Family History

Item PRS Family history
Variable type Continuous variable Categorical variable
Target diseases Diseases for which GWAS has been conducted in prior studies and summary statistics are available Common diseases with relatively high prevalence
Genomic data Required Not required
Dependence on family structure None Present
Influence of lifestyle Purely genetic Influenced by factors shared within families, such as lifestyle
Ethnicity/ancestry High accuracy in European populations, where GWAS is particularly well developed No difference

MR ─ Causal Inference

MR is an instrumental variable approach using genetic variants to estimate the causal effect of an exposure (phenotype A) on an outcome (phenotype B) using only two independent sets of GWAS summary statistics. The procedure is simple: first, for each SNP, obtain (1) the effect size for SNP→disease A and (2) the effect size for SNP→disease B, and then take the ratio of the two (the Wald ratio) to calculate the causal effect of disease A on disease B. The only things required are (1) GWAS summary statistics for disease A and (2) GWAS summary statistics for disease B. Analysts need only download the publicly available summary statistics and do not need to handle individual-level data directly. One important consideration is that, for the two GWAS summary statistics, it is recommended that (1) the GWAS samples do not overlap and (2) ethnicity/ancestry is matched. In Europe, MR research has advanced ahead of other regions because of the abundance of publicly available GWAS data. More recently, however, large-scale biobanks have also been developed in East Asia, particularly in Japan, and momentum is building to generate new evidence using the same approach.