Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2020 Sep 3;182(5):1214-1231.e11.
doi: 10.1016/j.cell.2020.08.008.

The Polygenic and Monogenic Basis of Blood Traits and Diseases

Dragana Vuckovic  1 Erik L Bao  2 Parsa Akbari  3 Caleb A Lareau  4 Abdou Mousas  5 Tao Jiang  6 Ming-Huei Chen  7 Laura M Raffield  8 Manuel Tardaguila  9 Jennifer E Huffman  10 Scott C Ritchie  11 Karyn Megy  12 Hannes Ponstingl  9 Christopher J Penkett  13 Patrick K Albers  9 Emilie M Wigdor  9 Saori Sakaue  14 Arden Moscati  15 Regina Manansala  16 Ken Sin Lo  5 Huijun Qian  17 Masato Akiyama  18 Traci M Bartz  19 Yoav Ben-Shlomo  20 Andrew Beswick  21 Jette Bork-Jensen  22 Erwin P Bottinger  23 Jennifer A Brody  24 Frank J A van Rooij  25 Kumaraswamy N Chitrala  26 Peter W F Wilson  27 Hélène Choquet  28 John Danesh  29 Emanuele Di Angelantonio  29 Niki Dimou  30 Jingzhong Ding  31 Paul Elliott  32 Tõnu Esko  33 Michele K Evans  26 Stephan B Felix  34 James S Floyd  35 Linda Broer  36 Niels Grarup  22 Michael H Guo  37 Qi Guo  38 Andreas Greinacher  39 Jeff Haessler  40 Torben Hansen  22 Joanna M M Howson  41 Wei Huang  42 Eric Jorgenson  28 Tim Kacprowski  43 Mika Kähönen  44 Yoichiro Kamatani  45 Masahiro Kanai  46 Savita Karthikeyan  38 Fotios Koskeridis  47 Leslie A Lange  48 Terho Lehtimäki  49 Allan Linneberg  50 Yongmei Liu  51 Leo-Pekka Lyytikäinen  49 Ani Manichaikul  52 Koichi Matsuda  53 Karen L Mohlke  8 Nina Mononen  49 Yoshinori Murakami  54 Girish N Nadkarni  15 Kjell Nikus  55 Nathan Pankratz  56 Oluf Pedersen  22 Michael Preuss  15 Bruce M Psaty  57 Olli T Raitakari  58 Stephen S Rich  52 Benjamin A T Rodriguez  7 Jonathan D Rosen  59 Jerome I Rotter  60 Petra Schubert  61 Cassandra N Spracklen  62 Praveen Surendran  63 Hua Tang  64 Jean-Claude Tardif  65 Mohsen Ghanbari  66 Uwe Völker  67 Henry Völzke  68 Nicholas A Watkins  69 Stefan Weiss  67 VA Million Veteran ProgramNa Cai  9 Kousik Kundu  70 Stephen B Watt  9 Klaudia Walter  9 Alan B Zonderman  26 Kelly Cho  71 Yun Li  72 Ruth J F Loos  15 Julian C Knight  73 Michel Georges  74 Oliver Stegle  75 Evangelos Evangelou  76 Yukinori Okada  77 David J Roberts  78 Michael Inouye  79 Andrew D Johnson  7 Paul L Auer  16 William J Astle  80 Alexander P Reiner  81 Adam S Butterworth  29 Willem H Ouwehand  82 Guillaume Lettre  65 Vijay G Sankaran  83 Nicole Soranzo  84
Affiliations
Meta-Analysis

The Polygenic and Monogenic Basis of Blood Traits and Diseases

Dragana Vuckovic et al. Cell. .

Abstract

Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant global health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including data for 563,085 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering a range of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation.

Keywords: UK Biobank; blood; chromatin; fine-mapping; genetics; hematopoiesis; omnigenic; polygenic risk; rare disease; splicing.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests Adam Butterworth has received grants (outside of this work) from AstraZeneca, Biogen, BioMarin, Bioverativ, Merck, Novartis, and Sanofi; James Floyd has consulted for Shionogi; Qi Guo is a full-time employee of BenevolentAI; Joanna Howson is a full-time employee of Novo Nordisk. Parsa Akbari is a full-time employee of Regeneron Pharmaceuticals.

Figures

None
Graphical abstract
Figure 1
Figure 1
GWAS Study Design and Results (A–E) (A) Study design, (B) illustration for fine-mapping (FM) strategy showing how the FM blocks and the relevant number of causative signals were defined, (C) distribution of FM results by MAF, (D) distribution of FM results by sentinel annotation and MAF, and (E) FM 95% credible set size distribution for each sentinel, across all traits: different colors indicate different cell type groups.
Figure S1
Figure S1
Replication and Mendelian Genes, Related to Figures 1 and 2 A, Comparison of replication effect size estimates, the x-axes shows effect sizes in MVP, the y-axes shows effect sizes in UK Biobank. The zoom-in panel highlights non-replicating variants in red. B, Proportions of correct gene to variant assignments for VEP worst consequence and VEP all consequences divided by functional annotation. Only known eQTLs in matched cell-types are shown and the correct gene is assumed to be the one identified by the eQTL experiment (eGene). C, Variants assigned by VEP to Mendelian genes across different functional annotations have higher effect sizes compared to other variants, after matching for MAF. The top 5 panels show absolute effect size distributions across all sentinel variants, where sentinels associated with multiple traits were included only once with the highest effect size. The middle 5 panels show the same distributions but after matching the non-Mendelian variants to the Mendelian ones by MAF. Stars denote significance: 0.005 < p value < 0.05; ∗∗ 0.0005 < p value < 0.005; ∗∗∗p value < 0.0005; FC = median fold change. The bottom 5 panels show the distributions of minor allele frequencies after matching.
Figure 2
Figure 2
Network Connectivity (A–B) Coexpression network in whole blood. For illustrative purposes, a subset of highly coexpressed genes is shown (correlation > 0.7). Edges are omitted for clarity, and the node size summarizes the number and strength of coexpression links. Blue dots represent genes detected by GWAS, violet dots are Mendelian genes, and red dots show the intersection. Grey dots are genes in the coexpression network that do not belong to any of the previous categories. GWAS genes are defined by two different variant annotation approaches: VEP all consequences (A) and 500kb FM regions (B). (C) Diagram showing the hypothesized genetic architecture of healthy blood traits. At the core of the underlying molecular network is the set of Mendelian genes which cause blood disorders when mutated. Peripherally to the core lie regulatory genes which affect the phenotype through core genes. Cis and trans-eQTLs can give insights about cell-type specificity and can identify master regulators, i.e., genes that trans-regulate several core genes simultaneously. (D) Enrichment of sets of genes in the coexpression network at different correlation cut-offs. Whiskers indicate 95% CI for the fold enrichment estimate. (E) Proportion of network genes among Mendelian, GWAS, or other genes with > 1 edge, or average number of edges, at different correlation cut-offs. (F) Example of a sub-network containing 3 Mendelian genes involved in platelets (GP9, ITGA2B, GP1BB). As in (A), blue dots are GWAS genes, red dots are GWAS and previously known Mendelian genes, and gray dots are other coexpressed genes.
Figure S2
Figure S2
Mendelian and Peripheral Enrichment Q-Q Plots, Related to Figure 2 Each Q-Q plot shows the enrichment for variants assigned to a 100kb interval surrounding Mendelian genes. Different GWAS traits are included: 4 exemplar blood traits and 8 unrelated traits, selected to have at least 500 significant GWAS associations. Overall, with the exception of the “Intelligence” trait, most non-blood phenotypes do not show enrichment for variants mapped to Mendelian blood disorder genes. Conversely, peripheral associations were more likely to be enriched in non-blood traits, showing enrichment for six out of eight traits. SMD = stem cell and myeloid disorders, BPD = bleeding and thrombotic disorders, BMF = bone-marrow failure; GW = genome-wide.
Figure S3
Figure S3
Network Examples and Functional Annotation, Related to Figure 2 A, A zoom-in example of the coexpression network, including connected genes with a very high correlation cut-off (0.8). Blue dots represent genes detected by GWAS, according to VEP worst-consequence annotation, red dots represent GWAS genes that are also Mendelian genes for blood disorders. Three Mendelian genes are identified, all of them involved in spherocytosis and other red-cell disorders. B-C, Receiver operating characteristic (ROC) curves for measuring classification performance of deltaSVM in two datasets: B) 18 hematopoietic populations sorted from bone marrow, and C) 8 stages of primary erythroid differentiation. D, Association between variant absolute deltaSVM score (maxSVM), reflecting a variant’s predicted disruption of chromatin accessibility, and bins of MAF. Dotted line indicates the median maxSVM score for the MAF 0.3-0.5 bin. E, Rare variants (MAC > 20, MAF < 1%, PPFM > 0.50, conditionally independent) grouped by genomic annotation. F, Flow-chart depicting the steps involved in the identification and validation of blood trait-associated splice variants. G, Density distribution of variant MAF, comparing 109 putative splice variants to all fine-mapped blood trait variants. H, Violin plot of the fine-mapped posterior probability (PPFM) for putative splice variants versus all fine-mapped variants. For variants fine-mapped to multiple blood traits, we used the maximum PPFM.
Figure 3
Figure 3
Functional Annotation of Blood Trait Variants (A) g-chromVAR results for FM variants (PPFM>0.1%) across 22 hematological traits. The Bonferroni-adjusted significance level (p = 0.05/22 traits ×18 cell types) is indicated by the dotted line. New traits are labeled in red. Novel enrichments are starred. The color legend for cell types is shared by panels (A), (B), and the trackplot in (H). mono = monocyte; gran = granulocyte; ery = erythroid; mega = megakaryocyte; CD4 = CD4+ T cell; CD8 = CD8+ T cell; B = B cell; NK = natural killer cell; mDC = myeloid dendritic cell; pDC, = plasmacytoid dendritic cell; MPP = multipotent progenitor; LMPP = lymphoid-primed multipotent progenitor; CMP = common myeloid progenitor; CLP = common lymphoid progenitor; GMP = granulocyte-macrophage progenitor; MEP = megakaryocyte–erythroid progenitor. (B) g-chromVAR enrichment results across 4 platelet traits (MPV, mean platelet volume; PCT, platelet crit; PDW, platelet distribution width; PLT, platelet count), using either all trait-associated variants (all), variants with any gene assignment (any gene), or only variants assigned to genes causative for BPD. The original Bonferroni-adjusted significance level is indicated by the dotted line. (C) The allelic effects of blood trait variants with (1) high (> 99th percentile) versus low (< 1st percentile) deltaSVM scores and (2) one or more predicted motif disruptions, on normalized motif scores. The normalized motif score represents the score for a variant-containing sequence as a percentage of the best score that motif could achieve on an ideal sequence. (D–F) Cell type-specific deltaSVM scores for variants disrupting the (D) GATA1, (E) CEBPA, or (F) GABPA motif compared to scores in non-motif-disrupting controls and non-lineage-specific cell types. Non-motif group indicates all other variants that do not disrupt the target TF. Gain or lost motif group contains variants predicted to create or disrupt the target TF motif, respectively, with the deltaSVM score for a lineage-specific cell type (erythroblast for GATA1, GMP for CEBPA, CD8 for GABPA). Non-lineage gain or lost indicates variants predicted to create or disrupt the target TF motif, but with the deltaSVM score for non-lineage-specific populations (CD8, CD4, and B cells for GATA1 and CEBPA; erythroblast and megakaryocytes for GABPA). (G) Lymphocyte count-associated variant rs72928038 has high chromatin accessibility (left) and deltaSVM score (right) in CD4 and CD8 populations. (H) rs72928038 is located within intron 1 of BACH2, and its minor allele A is predicted to break the motifs of TFs ETS1 and STAT3. In the bottom ATAC-seq plot, stacked colors represent accessibility for 18 hematopoietic cell types shown in (A).
Figure 4
Figure 4
Characterization of Rare Blood Trait Variants (A) Distribution of coding consequences of 456 rare variants (MAC > 20, MAF < 1%), annotated using VEP. (B) Phenome-wide association study of these 456 rare variants across 529 well-represented clinical phenotypes in the UK Biobank (n up to 408,961). Variants are grouped by the hematopoietic lineage with which they are associated (BASO, basophil; EO, eosinophil; LYMPH, lymphocyte; MONO, monocyte; NEUT, neutrophil; PLT, platelet; RBC, red blood cell; WBC, white blood cell). Some variants appear in more than one category if they are associated with traits from distinct lineages. Text labels indicate the clinical outcomes with the strongest association per category. The dotted line denotes the Bonferroni-adjusted significance level (corrected for 529 phenotypes). (C–E) Sashimi plots depicting splice alterations at 3 loci as determined by RNA-sequencing analysis, comparing carriers of a specified blood trait variant (top track) versus non-carriers (bottom track). (C) Intronic donor gain splicing event in CD3EAP among carriers of rs8113779 (PPFM = 0.23 for PLT, 2nd highest in credible set). Numbers within the splice junctions represent the number of reads supporting the junction. The x axis marks genomic coordinates. (D) Exonic donor gain splicing alteration in ULK3 associated with rs12898397 (PPFM = 0.071 for lymphocyte percent, 5th highest in credible set). (E) Donor loss splicing event in the TFR2 locus, induced by variant rs139178017 (PPFM = 0.73 for RDW, highest in credible set; PPFM = 0.4 for MCV, 2nd highest in credible set).
Figure 5
Figure 5
Polygenic Prediction of Blood Traits and Contribution to Common Diseases (A) Portability of the PGS across populations with European ancestry for 15 available traits. The red bar represents the Pearson’s correlation (R) between the score and the trait in the validation cohort (INTERVAL). Blue bars show the same in a French Canadian cohort called CARTAgENE. (B and C) Saturation analysis showing the number of discovered variants (B) and the proportion of heritability explained (C) as a function of GWAS sample size for mean platelet volume. The black dotted line is a linear projection of the first 3 points, the red dotted line is a linear interpolation of all points, and the red solid curve is the best model fitting the 4 points. (D) Number of loci with multiple sentinel variants, stratified by trait group. (E) Number of disease loci colocalizing (posterior probability > 99%) with at least one blood count locus, colored by known vs. new loci. (F–K) Examples of loci with multiple sentinels associated with blood cell counts, and with at least one disease-colocalization (red diamond) or PheWAS association (green diamond) for the following genes and diseases: ITGA4 and Inflammatory Bowel Disease (IBD) (F), RUNX1 and Rheumatoid Arthritis (G), NFKB1 and IBD (H), C1QTNF6 and Type-1 Diabetes (I), JAK2 and IBD (J), IL4 and asthma (K). In each panel, black dots show MAF (right y axis) and red dots show the effect size (in SD for the phenotype between brackets, left y axis) of each variant as a function of the variant’s position in the genomic interval.
Figure S4
Figure S4
Saturation Models, Related to Figure 5 A, For each trait, we show the number of conditionally independent variants (y-axes) discovered by GWAS in four cohorts of increasing sample size. The sample size is shown on x-axes in 10,000 s. Two linear regression lines are shown: the full black line represents a regression including all 4 data points, the dotted black line represents a linear projection of the first three data points for comparison. A decreasing trend can be observed for almost all traits. B, Similarly to panel a, the number of GWAS-identified genes is shown on the y-axes. Genes were identified by VEP worst-consequence annotations. C, The same data points as in panel a are now shown with the best fitting model line in red, which correspond to a square-root growth model. D, The same data points as in panel b are now shown with the best fitting model line in red, which corresponds to a square-root growth model. E, The plot shows the saturation analysis of the number of discovered Mendelian genes (red color) and peripheral genes (black color) as a function of the discovery sample size. Both lines represent the best fitting model interpolating the dots and are defined as a function of the square-root of the sample size.
Figure 6
Figure 6
Contribution of Polygenic and Rare Variation to Blood Diseases (A) Density distribution of PLT (109/liter) for UK Biobank participants who are heterozygous carriers (HET, red line) or wild-type (WT, black line) of the GP9 rs5030764 c.182A>G (p.Asn61Ser) variant pathogenic for Bernard-Soulier syndrome, plotted for participants whose PGS is above or below 2 SDs of the population platelet PGS. (B) Proportion of participants below the normal range for PLT (150×109/l) depending on PGS quintiles and GP9 rs5030764 carriage status. (C) Absolute effect sizes comparison between different rare variant annotations and the common polygenic score. A subset of previously unreported missense variants shows high effect sizes comparable to known pathogenic ones, nominating them as putative new pathogenic candidates. The contribution of the polygenic score is comparable to that of a pathogenic variant in heterozygosity. Diamond shapes represent median values. (D) Forest plot showing the association of PGS with rare blood disorders, top 30 results (by p-value) are shown. Significant associations, after Bonferroni correction, are indicated by the symbol for the discovery stage, while replication effects shown are all nominally significant. Diamonds represent odds ratios and whiskers show the 95% confidence interval.

Comment in

Similar articles

Cited by

References

    1. Abraham G., Tye-Din J.A., Bhalala O.G., Kowalczyk A., Zobel J., Inouye M. Accurate and robust genomic prediction of celiac disease using statistical learning. PLoS Genet. 2014;10:e1004137. - PMC - PubMed
    1. Astle W.J., Elding H., Jiang T., Allen D., Ruklisa D., Mann A.L., Mead D., Bouman H., Riveros-Mckay F., Kostadima M.A. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell. 2016;167:1415–1429.e19. - PMC - PubMed
    1. Auer P.L., Teumer A., Schick U., O’Shaughnessy A., Lo K.S., Chami N., Carlson C., de Denus S., Dubé M.P., Haessler J. Rare and low-frequency coding variants in CXCR2 and other genes are associated with hematological traits. Nat. Genet. 2014;46:629–634. - PMC - PubMed
    1. Bao E.L., Cheng A.N., Sankaran V.G. The genetics of human hematopoiesis and its disruption in disease. EMBO Mol. Med. 2019;11:e10316. - PMC - PubMed
    1. Benner C., Spencer C.C., Havulinna A.S., Salomaa V., Ripatti S., Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32:1493–1501. doi: 10.1093/bioinformatics/btw018. - DOI - PMC - PubMed

Publication types