Saturday 23 January 2021

NGS and an I-Y33765 clade-specific SNP mutation rate

In an earlier post I have written about single nucleotide polymorphisms (SNPs) and their significance for genetic genealogy.  This type of mutation has become the the "gold standard" genetic marker particularly since the introduction of Next Generation Sequencing (NGS) analysis (Xue & Tyler-Smith, 2010). NGS or massively-parallel sequencing uses technology devised by scientists at the University of Cambridge in the early years of this century.  To commercialize their invention these pioneers founded Solexa, a company that developed prototype DNA sequencers.  In 2007 their Solexa technology was acquired by Illumina, Inc., and today sequencers manufactured by this American company perform the majority of NGS worldwide.  The significance of NGS to medicine, forensics and academic research across many disciplines, is that this DNA sequencing system can radically reduced the time and cost of genome sequencing.  

For genealogists NGS has produced similar benefits.  The launch in November 2013 of the FamilyTreeDNA BigY analysis introduced the power and cost-effectiveness of NGS to our hobby.  At that time there was some excitement about the possibilities it promised for discovering genetic branches in clades where STR haplotypes had given insufficient resolution.  The subsequent "Big Y SNP tsunami" confirmed this promise and very soon demonstrated "that it should be possible to find a SNP specific for almost any Y chromosome, distinguishing even between fathers and sons" (Xue & Tyler-Smith, 2010). In addition SNPs provide a "molecular clock" with which we can calibrate our ancestry based on the frequency with which novel base substitutions occur.  Because most of the Y-chromosome is not affected by the recombination that rearranges at each generation the DNA in other chromosomes, it is ideal for calculating the time to most recent common ancestor (TMRCA) from related Y-DNA samples (Jobling & Tyler-Smith, 2017). Indeed, several years before the BigY test became available scientists began using NGS to estimate this useful human Y-DNA mutation rate.  In 2009, Xue et al. employed that technology to estimate Y-chromosome mutation rate in a 13 generation Chinese pedigree belonging to Haplogroup O3a.  Their pedigree contained four high confidence SNP mutations and using these they calculated a mutation rate of 1.0*10-9 mutations per nucleotide per year which is equivalent to one new SNP being formed each 118.1y.  

Since then other estimates for the SNP mutation rate on the human Y-chromosome have been obtained using evidence gathered from evolutionary or genealogical time scales (see Figure 1).  Mutation rates that have been published based on pedigrees tend to be slightly faster than evolutionary rates based on ancient DNA or other prehistoric samples (Balanovsky, 2017). 



Figure 1: Y-chromosome SNP mutation rates compared with the estimate based on the Swensson, I-Y33765, 19 generation Swedish pedigree.       

The SNP mutation rate across the entire genome is dependent on a range of biological, environmental, physical and cultural variables (Goldman et al., 2019).  These factors modulate the rate in both male and female germlines but appear to have greatest significance for the male where spermatogenesis continues from adolescence into old age.  Consequently as men age there is a positive correlation between the increasing number of germline replications and the number of de novo mutations passed on in their sperm.  At age fifty the number of new mutations introduced in offspring is very roughly twice that at age twenty.  So the age of father's at the conception of their children has a significant influence on mutation frequency.  This observation illustrates the importance of knowing the generation time within a particular pedigree as lineages in which the generation times are longer may exhibit higher mutation rates. 

It is perhaps worth noting that studies of historical lineages with particularly lengthy generation times show that they may be more likely to die out because of the poorer genetic fitness of late born children which reduces the probability that they marry and procreate (Goldman et al., 2019). Although we may consider shorter life expectancy in historical situations might limit older paternity this may not be the case.  In England in the fourteenth century, men who were tenant farmers and who reached the age of 25y (hence of above average social status, "the middling sort") had a remaining life expectancy of 23.3-25.7y (Jonker, 2003) making it not at all improbable for such men to be fathering children in their late forties.  In short, children of older fathers are likely to inherit more de novo mutations and so "mutation rate cannot be treated as a constant scaling factor, but rather must be considered along with the paternal generation interval as a time dependent variable" (Kong et al., 2012).

Many genealogists who use commercial tests that generate NGS data in BAM or vcf file formats choose to upload these to the Y-chromosome sequence interpretation service marketed by YFull.com, Moscow.  This online interface gives users quality and age estimations for their derived alleles using a methodology and SNP mutation rate based on that described by Adamov et al., (2015).  These authors devised criteria to screen SNPs within Y-chromosome NGS data from two commercial laboratories, FamilyTreeDNA and Full Genome Corporation, so that they could "select actual mutations and exclude false positives in individual samples".  Their screening method only selects mutations for use in age calculations that are recorded in 857 so-called combBED regions of the Y-chromosome.  These regions are in the X-degenerate, non-recombining part of the Y-chromosome and so we would expect the nucleotide sequences they contain to be nearly identical between a patriarch and his male descendants.  Also,using NGS the possibility of false-positive or false-negative calls (incorrect identification of individual base variants) is lowest within the X-degenerate portion of the Y-chromosome and so, using only novel variants identified in that region, should keep errors as low as possible.  But even so, according to these authors, perhaps one novel variant in five may still be false.  To eliminate these they specify a further seven criteria which they claim "collectively eliminated up to on-third of entered variants, an average of 20%".

At the end of 2020 a Big Y-700 test within the I-Y33765 clade was completed for Hallberg YF80422.  He and another Swedish man, Jacobsson IN70815, have direct-line male ancestry from their shared patriarch, Nils Swensson, who was born in 1631 at Hallingshult, Locknevi, Kalmar, Sweden (Medin, C. and Hallberg, M., personal communications, December, 2020).  Using 111 Short Tandem Repeat (STR) markers these men are separated by a genetic distance (GD) of five.  By comparing their Big Y-700 vcf files using the YFull.com analysis their most recent shared mutation downstream of I-Y33765 is I-BY198548 and consequently it is speculated that this SNP may have formed circa 1630 (see Figure 3).


Table 1: Single Nucleotide Polymorphisms (SNPs) within the pedigree of two male descendants of Nils Swensson, 1631-1713.  Based on their pedigree the Y-chromosomes of these two men are separated by nineteen generations equivalent to 657y.   The position (combBED region), reads (>2), read quality and absence of INDELS of the six mutations shown in red are criteria that confirm these SNPs are suitable for use in estimating the pedigree specific mutation rate.

According to their documented pedigree the Y-chromosomes carried by these men are separated by 657y equivalent to nineteen generations which indicates an average generation time of 34.58y.  YFull interpretation of their Y-chromosome sequence information from vcf files has identified ten SNPs that form during the pedigree.  Of these, six SNPs are located within CombBED regions of the Y-chromosome and each of these are recorded with >2 reads and a read quality >90% (see Table1).  In addition none of these six SNPs are associated with nucleotide insertions or deletions (INDELS).  So, as they each satisfy four important "filtration criteria" developed by Adamov et al., (2015) it would seem reasonable to consider they are true, high confidence, mutations that are suitable to use when estimating the pedigree mutation rate.  In this way, during the period recorded in the pedigree we can predict one mutation every 109.5y (657y/6SNPs). This directly observed mutation rate k can then be used to calculate the base substitution rate constant µ as follows:                                          

                                                        µ   =    (1/coverage) / K

                                                        µ   =    (1/8467165bp ) / 109.5y

                                                        µ   =    1.0798 * 10-9bp-1y-1

Figure 3:  Nineteen generation pedigree for two direct-line male descendants of Nils Swensson, 1631-1713.  This diagram is based on genealogical research generously shared by C. Medin & M. Hallberg, December, 2020. 

The characteristics of our Swedish pedigree, and of the base substitution rate derived from it, are similar to, and I would argue consistent with, that reported by Xue et al., (2009) using their 13 generation Chinese pedigree.  However, their mutation rate and ours are both faster (see Figure 1) than the consensus range (0.75-0.89 substitutions per base pair per year) suggested by Balanovsky (2017).   Does this discrepancy matter and what may it tell us?  As mentioned earlier it is known that the frequency with which SNPs occur can be influenced by a range of variables. Among these, the age at which men father their children (Campbell & Eichler,  2013; Goldman et al., 2019) is probably the most significant but variation in base substitution rate has also been reported between Y-DNA haplogroups (Ding et al., 2020) and between families (Conrad et al., 2012). 

The average generation time obtained from our Swedish pedigree is 34.6y.  In a cross-cultural study of human generation time Fenner (2005) reported only a very small variation between that for men in contemporary hunter-gatherer societies and that for men in developed countries and he suggested a working figure of 31y for male generation time irrespective of cultural setting.  This may suggest that the time recorded in our Swedish pedigree, which is 10% longer than Fenner's consensus figure, could be significant and possibly indicate a familial or cultural bias to later fatherhood.  

It seems to me that while the rate we have estimated from the Swensson Swedish pedigree is faster than any published genealogical SNP mutation rate it has the advantage of having been calculated using genealogy and observed high confidence SNPs that are specific to the Swedish arm of the I-Y33765 clade.  For this reason I consider it to be the mutation rate that is most applicable when calculating TMRCA within the I-Y33765 clade. 

In summary, it has recently been possible to calculate an I-Y33765 specific SNP mutation rate using Big Y-700 NGS data and a 19 generation Swedish pedigree.  By applying criteria to screen for SNP quality (Adamov et al., 2015) six, high confidence, derived alleles that formed within the pedigree have been identified.  The resulting base substitution constant is 1.0798 * 10-9 mutations per nucleotide per year.  This is equivalent to one new SNP being formed every 109.5y.  This mutation rate should be useful when calculating TMRCA within the I-Y33765 clade. 

References

Adamov, D., Guryanov, V., Karzhavin, S., Tagankin, V., Urasin, V. (2015) Defining a new rate constant for Y-chromosome SNPs based on full sequencing data. Russian Journal of Genetic Genealogy 1:3–36

Balanovsky, O., Zhabagin, M., Agdzhoyan, A., Chukhryaeva, M., Zaporozhchenko, V., Utevska, O., Highnam, G., Sabitov, Z., Greenspan, E., Dibirova, K., Skhalyakho, R., Kuznetsova, M., Koshel, S., Yusupov, Y., Nymadawa, P., Zhumadilov, Z., Pocheshkhova, E., Haber, M., Zalloua, P.A., Yepiskoposyan, L., Dybo, A., Tyler-Smith, C., Balanovska, E. (2015) Deep phylogenetic analysis of haplogroup G1 provides estimates of SNP and STR mutation rates on the human Y-chromosome and reveals migrations of Iranic speakers. PLoS ONE 10(4): e0122968. https://doi.org/10.1371/journal.pone.0122968

Balanovsky, O (2017) Toward a consensus on SNP and STR mutation rates on the human Y-chromosome, Human Genetics, 136, 575-590

Conrad, DF., Keebler, JEM., DePristo, MA., Lindsay, SJ., Zhang, Y., Cassals, F., Idaghdour, Y., Hartl, CL., Torroja, C., Garimella, KV., Zilversmit, M., Cartwright, R., Rouleau, G., Daly, M., Stone, EA., Hurles, ME and Awadalla, P. (2012) Variation in genome-wide mutation rates within and between human families, Nature Genetics 43, 712-714

Ding, Q.,Hu, Y., Koren, A and Clark, A.G. (2020) Mutation Rate Variability across Human Y-Chromosome Haplogroups, Molecular Biology and Evolution, https://doi.org/10.1093/molbev/msaa268

Fenner, J.K. (2005) Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies, American Journal of Physical Athropology, 128, 415-423

Francalacci, P., Morelli, L., Angius, A., Berutti, R., Reinier, F., Atzeni, R., Pilu, R., Busonero, F., Maschio, A., Zara, I., Sanna, D., Useli, A., Urru, M., Marcelli, M., Cusano, R., Oppo, M., Zoledziewska, M., Pitzalis, M., Deidda, F., Porcu, E., Poddie, F., Kang, H., Lyons, R., Tarier, B., Gresham, J., Li, B., Tofanelli, S., Alonso, S., Dei, M., Lai, S., Mulas, A., Whalen, M., Uzzau, S., Jones, C., Schlessinger, D., Abecasis, G., Sanna, S., Sidore, C., Cucca, F. (2013) Low-pass DNA sequencing of 1200 Sardinians reconstructs European Y-chromosome phylogeny. Science 341, 565–569

Fu, Q., Li, H., Moorjani, P., Jay,  F., Slepchenko, SM., Aleksei, A., Johnson, PLF., Petri, AA., De Filippo, C., Meyer, M., Zwyns, N., Salazar-Garcia, DC., Yaroslav, V., Keates, SG., Kosintsev, PA., Razhev, DI., Michael, P., Peristov, NV., Lachmann, M., Douka, K., Thomas, FG., Slatkin, M., Hublin, J., Reich, D., Kelso, J., Viola, B . (2014) The genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514:445–449

Goldman, J.M., Veltman, J.A. and Gilissen, C (2019) De Novo Mutations Reflect Development and Aging of the Human Germline, Trends in Genetics, 35, 828-839

Helgason, A., Einarsson, AW., Guðmundsdóttir, VB., Sigurðsson, Á., Gunnarsdóttir, ED., Jagadeesan, A., Ebenesersdóttir, SS., Kong, A., Stefánsson, K. (2015) The Y-chromosome point mutation rate in humans. Nature Genetics 47, 453–457.

Heyer, E., Puymirat, J., Dieltjes, P., Bakker, E., and de Knijff, P. (1997) Estimating Y chromosome specific microsatellite mutation frequencies using deep rooting pedigrees, Human Molecular Genetics, 6, 799–803

Jobling, MA., Tyler-Smith, C. (2017) The human Y chromosome: an evolutionary marker comes of age. Nature Review of Genetics 4, 598-612

Jonker, M.A. (2003) Estimation of Life Expectancy in the Middle Ages, Journal of the Royal Statistical Society, Series A (Statistics in Society) 166, 105-117

Karmin, M., Saag, L., Vicente, M., Wilson Sayres, MA., Järve, M., Talas, UG., Rootsi, S., Ilumäe, AM., Mägi, R., Mitt, M., Pagani, L., Puurand, T., Faltyskova, Z., Clemente, F., Cardona, A., Metspalu, E., Sahakyan, H., Yunusbayev, B., Hudjashov , G., DeGiorgio, M., Loogväli, EL., Eichstaedt, C., Eelmets, M., Chaubey, G., Tambets, K., Litvinov, S., Mormina, M., Xue, Y., Ayub, Q., Zoraqi, G., Korneliussen, TS., Akhatova, F., Lachance, J., Tishkoff, S., Momynaliev, K., Ricaut, FX., Kusuma, P., Razafindrazaka, H., Pierron, D., Cox, MP., Sultana, GNN., Willerslev, R., Muller, C., Westaway, M., Lambert, D., Skaro, V., Kovačević, L., Turdikulova, S., Dalimova , D., Khusainova, R., Trofimova, N., Akhmetova, V., Khidiyatova, I., Lichman, DV., Isakova, J., Pocheshkhova, E., Sabitov, Z., Barashkov, NA., Nymadawa, P., Mihailov, E., Seng, JWT., Evseeva, I., Migliano, AB., Abdullah, S., Andriadze, G., Primorac, D., Atramentova, L., Utevska, O., Yepiskoposyan, L., Marjanović, D., Kushniarevich, A., Behar, DM., Gilissen, C., Vissers, L., Veltman, JA., Balanovska, E., Derenko, M., Malyarchuk, B., Metspalu, A., Fedorova, S., Eriksson, A., Manica , A., Mendez, FL., Karafet, TM., Veeramah, KR., Bradman, N., Hammer, MF., Osipova, LP., Balanovsky, O., Khusnutdinova, EK., Johnsen, K., Remm, M., Thomas, MG., Tyler-Smith, C., Underhill, PA., Willerslev, E., Nielsen, R., Metspalu, M. et al (2015) A recent bottleneck of Y chromosome diversity coincides with a global change in culture. Genome Research 25, 459–466

Kong, A., Frigge, M.L., Masson, G., Besenbacher, S.,Sulem, P., Magnusson, G.,  Gudjonsson, S.A., Sigurdsson, A., Jonasdottir, A., Jonasdottir, A.,Wong, W., Sigurdsson, G., Walters, G.B., Steinberg, S., Helgason, H., Thorleifsson, G., Gudbjartsson, D.F., Helgason, A., Magnusson, O.T., Thorsteinsdottir, U., and Stefansson, K. (2012) Rate of de novo mutations, father’s age, and disease risk, Nature, 488 471–475

Poznik, GD., Henn, BM., Yee, M., Sliwerska, E., Ghia, M., Lin, AA., Snyder, M., Quintana-Murci, L., Kidd, JM., Underhill, PA., Bustamante, CD., (2013) Science 341, 562–565

Xue, Y., Wang, Q., Long, Q., Ng, BL., Swerdlow, H., Burton, J., Skuce, C., Taylor, R., Abdellah, Z., Zhao, Y., Macarthur, DG., Quail, MA., Carter, NP., Yang, H. (2009) Human Y chromosome base-substitution mutation rate measured by direct sequencing in a Deep-rooting pedigree. Current Biology 19, 1453–1457

Xue, Y.,Tyler-Smith, C. (2010) The hare and the tortoise: one small step for four SNPs, one
giant leap for SNP-kind, Forensic Science International: Genetics 4,  59–61

 

 

 

 

 


Warlords, foederati, princes or pirates: Exploring some characteristics of the men involved in the star cluster expansion downstream of I-Y4252

There would seem to be something remarkable about the man who was the founder of the I-Y4252 haplogroup.  We can see this clearly from the e...