Evaluating the role of common risk variation in the recurrence risk of schizophrenia in multiplex schizophrenia families

Importance: Multiplex schizophrenia families have higher recurrence risk of schizophrenia compared to the families of singleton cases in the population, but the source of increased familial recurrence risk is unknown. Determining the source of this observation is essential, as it will define the relative focus on common versus rare genetic variation in case-control and family studies of schizophrenia. Objective: To evaluate the role of common risk variation in the recurrence risk of schizophrenia, by comparing the polygenic risk scores in familial versus ancestry matched singleton cases of schizophrenia. Design: We used the latest genome-wide association study data of schizophrenia (N=166,464) to construct polygenic risk scores in multiplex family members, singleton cases and controls. To account for the high degree of relatedness in the samples, analyses were carried out using a mixed effects logistic regression model with the family structure modeled as a random effect. Setting: Population and family based. Participants: We used a large, homogenous sample of 1,005 individuals from 257 families from the Irish Study of High-Density Schizophrenia Families, 2,224 singleton cases and 2,284 population controls all from the population of the island of Ireland. Exposures: Polygenic risk scores, diagnostic categories, familial or singleton case status. Main outcomes and measures: The primary outcomes were the mixed effects logistic regression results generated from comparison between different groups. Results: Polygenic risk scores in singleton cases did not differ significantly from familial cases (p=0.49), rejecting the hypothesis that an increased burden of common risk variation can account for the higher recurrence risk of schizophrenia in multiplex families. Conclusions and relevance: This study suggests that a higher burden of common schizophrenia risk variation cannot account for the increased familial recurrence risk of schizophrenia in multiplex families. In the absence of elevated polygenic risk scores in multiplex schizophrenia families, segregation of rare variation in the genome and environmental exposures unique to the families may explain the increased multiplex familial recurrence risk of schizophrenia. These findings also further validate the concept of a genetically influenced psychosis spectrum in multiplex schizophrenia families as shown by a continuous increase of common risk variation burden from unaffected relatives to familial cases of schizophrenia in the families.


Introduction
Schizophrenia (SCZ) is a severe, clinically heterogeneous disorder with a prevalence of ~1% 1 , and heritability (h 2 ) estimates of ~80% [2][3][4][5][6] . Family history is the strongest risk factor for developing SCZ 7 . Despite high heritability, ~2/3 of SCZ cases report no family history of psychotic illness, and most subjects with a positive family history report only a single affected relative 8,9 , concordant with the rates of 29% family history positive and 71% family history negative observed in the sample of singleton SCZ cases studied here 10 .
Many linkage studies of SCZ were undertaken in samples of families with multiple cases of SCZ or related psychosis-spectrum disorders, like the Irish Study of High-Density Schizophrenia Families (ISHDSF) [11][12][13][14] . Such multiplex families display substantially higher recurrence risk of SCZ than reported in singleton cases 8,9 . This discrepancy in recurrence risk suggests that there may be important differences in the genetic or environmental risk architecture between familial and singleton SCZ cases that warrant further investigation.
One explanation of this difference is that SCZ cases from multiplex families may carry a higher burden of common risk variation, and higher SCZ polygenic risk scores (PRS), than ancestry matched singleton cases. Another explanation is that the increased recurrence risk in multiplex families may be attributable to segregation of rarer, higher risk variation, identified through exome or whole-genome sequencing. Sequencing studies suggest strongly that rare, deleterious variation in the genome is involved in the genetic etiology of SCZ and other psychiatric disorders [15][16][17][18][19][20] , but the extent to which rare variation contributes to SCZ risk in multiplex families is currently unknown. A third hypothesis, not addressed here, is that familial cases may have increased exposure to environmental risks.
Mega-analyses of SCZ genome-wide association study (GWAS) data by the Psychiatric Genomics Consortium Schizophrenia Working Group (PGC-SCZ) have identified common risk variants associated with SCZ [21][22][23] . In the most recent PGC mega-analysis (PGC3-SCZ), 270 independent risk loci were found to be robustly associated with SCZ, with single nucleotide polymorphism (SNP) based h 2 of ~24%. GWAS data from such analyses are frequently used to construct PRS to index an individual's common variant genetic risk for a disorder. Although PRS currently lack power to predict SCZ in the general population, they have been shown to index meaningful differences in SCZ liability between individuals. In the European PGC3-SCZ samples the highest PRS centile has an OR of 44 (95% CI=31-63) for SCZ compared to the lowest centile of PRS, and OR of 7 (95% CI=5. 8-8.3) when the top centile is compared with the remaining 99% of the individuals in the sample 23 .
We have previously used the summary statistics from the first wave of PGC-SCZ megaanalysis 21 to investigate whether the concept of the psychosis spectrum is supported by empirical data in the ISHDSF 27 . PRS analyses have been performed for other psychiatric phenotypes in multiplex family samples smaller than the ISHDSF [24][25][26] . Here, we extend our previous work by using PRS profiling in a large, homogeneous sample of multiplex SCZ families, singleton SCZ cases and population controls from the island of Ireland to directly test the hypothesis that common risk variation in the genome may explain the increased recurrence risk of SCZ in multiplex SCZ families compared to families of singleton cases. Furthermore, to demonstrate the specificity of the constructed PRS from PGC3-SCZ GWAS for SCZ, we also constructed a PRS for low-density lipoprotein (LDL) as a negative control in our analysis. The source of the increased familial recurrence risk of SCZ is important for future research into the genetic etiology of familial SCZ and potentially for both diagnosis and treatment of SCZ with different familial backgrounds.

Irish Study of High-Density Schizophrenia Families
Probands in the ISHDSF sample were ascertained from psychiatric hospitals in the Republic of Ireland and Northern Ireland 28 . Inclusion criteria were two or more first-degree relatives meeting DSM-III-R criteria for SCZ or poor-outcome schizoaffective disorder (PO-SAD) and all four grandparents being born in Ireland or the United Kingdom. Relatives of probands were interviewed by trained field staff. Hospital and out-patient records were obtained and abstracted in > 98% of cases with SCZ or PO-SAD diagnoses.
The concentric diagnostic schema of the ISHDSF includes 4 case definitions: narrow (SCZ, PO-SAD), intermediate (adding schizotypal personality, schizophreniform, and delusional disorders, atypical psychosis and good-outcome schizoaffective disorder), broad (adding psychotic affective illness, paranoid, avoidant and schizoid personality disorders and other disorders that significantly aggregate in relatives of Irish probands) and very broad (adding any other psychiatric illness). The ISHDSF sample also includes unaffected family members with no diagnosis of any psychiatric illness (Table 1). The ISHDSF diagnostic schema is described extensively elsewhere 29 .
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Irish Schizophrenia Genomics Consortium Case/Control Sample (ISGC)
The ISGC sample was assembled for a GWAS of SCZ in Ireland. Details of recruitment, screening and quality control (QC) methods used for the ISGC sample have been previously described in detail elsewhere 30 . Briefly, the case sample was recruited through community mental health service and inpatient units in the Republic of Ireland and Northern Ireland following protocols with local ethics approval. All participants were interviewed using a structured clinical interview for DSM-III-R or DSM-IV, were over 18 years of age and reported all four grandparents born either in Ireland or the United Kingdom. Cases were screened to exclude substance-induced psychotic disorder or psychosis due to a general medical condition. with the additional ISHDSF individuals described above. The same QC protocols were applied to All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.21.21259285 doi: medRxiv preprint all three datasets and full details are described elsewhere for ISHDSF 29 and the case-control sample 30 . In brief, exclusion criteria for samples were a call rate of <95%, more than one Mendelian error in the ISHDF, and difference between reported and genotypic sex. Exclusion criteria for SNPs were MAF <1%, call rate <98%, and p<0.0001 for deviation from Hardy-Weinberg expectation. The final ISHDSF sample included 1,005 individuals from 257 pedigrees, and the final case-control sample included 4,508 individuals (2,224 cases and 2,284 controls), whose SNP data passed all QC filters.

Imputation
Genotypes passing QC were phased using Eagle V.2.4 32 and phased genotypes were then imputed to the Haplotype Reference Consortium (HRC) reference panel 33 on the Michigan Imputation Server using Minimac4 34 . The HRC reference includes 64,975 samples from 20 different studies that are predominantly of European ancestry, suitable for imputation in our sample from Ireland. Each of the three genotype sets were imputed separately and the imputed genotype probabilities in VCF format were downloaded from the Michigan Imputation Server, and genotype dosages in the VCF files were extracted and used for PRS construction and analyses. As part of the post-imputation QC for this analysis, variants with MAF <1% and r 2 score of <0.3 were excluded from the analyses. After imputation and all QC, 9,298,012 SNPs in ISHDSF, 11,080,279 SNPs in the case-control sample and 11,081,999 SNPs in the combined PsychArray sample remained for analysis; 9,008,825 SNPs were shared across all three datasets and were used for PRS construction and downstream analyses.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Construction of Polygenic Risk Score
The GWAS summary statistics for PGC3-SCZ (N= 306,011) were first QC'd by excluding variants with MAF < 0.01 and imputation quality score of < 0.9, as well as removing strand ambiguous variants and insertion deletion polymorphisms. We then constructed PRS for all subjects using a Bayesian regression framework by placing a continuous shrinkage prior on SNP effect sizes using PRS-CS 35 . PRS-CS uses LD information from an external reference panel In order to show the specificity of the PRS constructed from PGC-SCZ in our analysis, an additional PRS for low density lipoprotein (LDL, N=87,048) from the ENGAGE Consortium 40 was also constructed using the same protocol described for PGC-SCZ above. Genetic correlation studies show that there is no significant correlation between SCZ and LDL, making LDL an appropriate phenotype as a negative control for this analysis 41,42 .
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Genomic Relationship Matrix and Statistical Analyses
Statistical analyses on PRS were carried out using a mixed effects logistic regression model with in the GMMAT package 43 in R 44 . To account for the high degree of relatedness among individuals in the study, we used a series of mixed effects logistic regression models and modeled the family structure as a random effect with genetic relationship matrix (GRM) calculated using LDAK with LD correction parameters suited for families 45 . In addition to adjusting for GRM as a random effect in the mixed model, we also included sex as a fixed effect.
The final results were adjusted for multiple testing using the Holm method by utilizing p.adjust() command in R.

Results
The mean SCZ PRS across the diagnostic categories for SCZ and LDL are displayed in All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

No significant PRS difference between familial and singleton cases of SCZ
Logistic mixed models for SCZ PRS (Table 2, Supplementary Table 2 for full comparison results) showed no significant difference between familial SCZ cases from ISHDF and singleton SCZ cases from the population, (p = 0.49), rejecting the hypothesis that an increased burden of common risk variants can account for the higher recurrence risk of SCZ in ISHDSF.

All family members carry a high burden of common SCZ risk variants
All ISHDSF diagnostic categories, including the unaffected family members show a significantly higher SCZ PRS than those observed in the population controls (   Table 2), underlining the important role of common risk variation in the genetic architecture of SCZ in familial as well as singleton cases. Finally, we observe a significantly higher PRS in unaffected family members compared to the population All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.21.21259285 doi: medRxiv preprint controls (P < 1.07e -29 ), indicating the presence of a higher burden of common schizophrenia risk variants across all family members, regardless of their diagnostic status.

Discussion
This study aimed to investigate the source of increased recurrence risk of SCZ in multiplex SCZ families compared to the families of singleton cases in the population. Multiplex SCZ families represent the upper bounds of the distribution of recurrence risk for SCZ, and our results provide empirical evidence that increased recurrence risk of SCZ in multiplex SCZ families is unlikely to be attributable to an increased burden of common SCZ risk variation in the genome. Therefore, the hypothesis that high familial recurrence risk of SCZ in multiplex families may be attributable to excess rare variation specific to schizophrenia warrants further investigation. These findings also further validate the concept of a genetically influenced psychosis spectrum in multiplex SCZ families as shown by a continuous increase of common risk variation burden across all members of ISHDSF, from unaffected family members, to narrow SCZ cases in ISHDSF.
Analyses of multiplex BIP families have shown that affected and unaffected individuals in in these families have significantly higher PRS for both BIP and SCZ compared to ancestry matched population controls, and familial cases have a significantly higher BIP PRS compared to ancestry matched, singleton cases. These results in addition to sparse evidence for the involvement of rare risk variation in the genetic architecture of BIP, demonstrates the importance of common risk variation in the genetic architecture of familial BIP 25 .
In the last decade, whole exome sequencing studies (WES) of SCZ in family and case-control samples have demonstrated that in addition to common risk variation, rare variation also plays an important role in SCZ risk [46][47][48][49] . Although WES studies are only now reaching sample sizes All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The results presented in this study should be interpreted in the context of several limitations.
First, the liability that is captured by PRS constructed from PGC3-SCZ is currently insufficient for predicting a diagnosis of SCZ (AUC=0.71) 23 , meaning that PRS alone cannot be used as a diagnostic tool. Despite that, PRS is shown to be a reliable measurement of common risk All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.21.21259285 doi: medRxiv preprint variation in the genome, making it a suitable for indexing an individual's risk for SCZ in this study. Second, Diagnostic ascertainment of ISHDSF required two first-degree relatives with a diagnosis of SCZ or PO-SAD. Therefore, the various diagnostic categories in the ISHDSF diagnostic schema are not equally represented across all families 28   All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.21.21259285 doi: medRxiv preprint Tables: Table 1: Sample description of ISHDSF. The concentric diagnostic hierarchy of ISHDSF contains 4 case definitions: narrow, Intermediate, Broad and Very Broad. These case definitions in ISHDSF reflect core (narrow and intermediate) and periphery (broad and very broad) of the psychosis spectrum based on previous genetic epidemiology work. Poor-outcome schizoaffective probands are placed in the narrow category (60), whereas good-outcome schizoaffective probands are placed in the intermediate category (30). Table 2: Comparison of the PRS in singleton cases and different diagnostic categories of ISHDSF from mixed-effects logistic regression models. The first row follows the hypothesis that familial cases have higher PRS compared to singleton cases. The comparisons under PGC3-SCZ follow the hypothesis that singleton cases and familial categories in ISHDSF have a higher PRS for SCZ compared to population controls. The comparisons under LDL are used as a negative control and follow the same hypothesis that singleton cases and familial categories in ISHDSF have a higher PRS for LDL compared to population controls.