Sunday, 27 September 2020

SNPs, STRs and the ancestral tree of I-Y33765

The I-Y33765 mutation is found within the "non-recombining" portion of the human Y chromosome.  Depending on the version of the Genome Reference Consortium (GRC, 2007) Human Reference sequence which is used (at present this will be either 37 [aka Human Genome Assembly hg19] or 38 [aka Human Genome Assembly hg38]) the numeric position of the SNP (single nucleotide polymorphism) on the Y chromosome is either 16276363 [hg19] or 14164483 [hg38] (see image below).  The reference sequences record a guanine (G) molecule at the respective position which is replaced or substituted in the mutated form by a cytosine (C) molecule.  The G, reference form, is termed "ancestral" or "negative" while the C, mutation form, is termed "derived" or "positive".

This mutation or "base substitution" was first identified in 2017 by YFull, Moscow, Russian Federation in a comparison of two FamilyTreeDNA Big Y-500 test BAM files.  YFull named the mutation Y33765; the letter Y indicating that it was first identified in their analysis and 33765 being the next available integer in their company's numeric sequence. 

Unlike the DNA in other chromosomes of the human genome, which is "shuffled" in a process known as "cross-over" when the egg or sperm are formed (hence at each generation) in order to create genetic diversity in a population, the majority of DNA in the Y chromosome is within the "non-recombining region" that remains largely unchanged.  This means that it's genetic sequence is generally "conserved" or stable and at each generation passes from father to son with only occasional small random changes.  

The genetic evolution of this non-recombining part of the Y chromosome is classified into groups termed "haplogroups" that each contain large numbers of these mutations  each of which is termed a single nucleotide polymorphism (SNP).  The haplogroups are themselves each defined by a particular SNP [aka mutation or base substitution].  These major haplogroups are identified by a letter; the oldest male haplogroup designated A and the youngest R.  Because of the generally stable conserved nature of the Y chromosome, together with the occasional random SNP changes, it has been possible for geneticists to construct an ancestral tree for male Homo sapiens.  The Y33765 SNP is a very recent mutation within the I haplogroup.  It seems very probable that the first male Homo sapiens to enter Europe belonged to this I haplogroup. 

 

More precisely the I-Y33765 SNP is within the extensive subclade or branch of haplogroup I termed I2a1a which is itself defined by the SNP P37.2.   This major branch of the I haplogroup is estimated to have formed (circa 18,400ybp) during the last glacial maximum somewhere in southern Europe.  There are at least six phylogenetically significant SNPs in the ancestral tree of Y33765 between it and P37.2 as shown in the simplified table below.

 

From the table it is easy to understand that SNPs are very useful tools that can be used to determine a mans deep ancestry on a scale of many thousands of years.  In Scandinavia and the Baltic region the I2 male ancestral lineage is associated with hunter-gather populations.  In central southern Sweden male ancient DNA samples from skeletal remains excavated at Motala (Mathieson et al.,2009) have been characterised as belonging to this haplogroup.  The Motala hunter-gather encampment has been dated to the Mesolithic period.  In Lithuania the skeletal remains of a man buried near a supposed shamanic ritual site has produced Y-DNA with an I-L233 haplogroup (Butrimas, 1992; Mittnik et al.,2018) .  Radiocarbon dating has estimated he was alive during the late mesolithic/early neolithic. 

If two chaps are each derived for a given SNP then they must be related through a shared or common male ancestor.  If they are each derived for a SNP that formed very recently, such as I-Y33765, then their common ancestor must have lived at some time after the original formation of that SNP.  By comparing the SNPs that are shared by several men it is possible to draw a phylogenetic tree that maps their genetic relationship.

While the presence or absence of a combination of SNPs is the definitive test used to confirm a mans haplogroup and hence his genetic relationships, SNPs are not the only form of alteration within the Y chromosome that can provide this information.  STR's (short tandem repeats) are another form of Y-chromosome marker which can be used to discover such information.  Unlike SNPs, that involve just a single base change in the DNA sequence, STRs are areas of the DNA molecule that contain repeat sequences of from two to perhaps a dozen or more bases.  The number of repeats of these short series of bases at a particular site or "loci" can vary between individuals as a result of random mutation.   Not all STR sites mutate at the same rate.  So, by using a combination of STR markers, that mutate with differing speeds, it is possible to determine a chaps STR profile or "haplotype".  A mans STR haplotype can be used to predict his haplogroup (what SNPs he is likely to be derived or positive for) and to compare his genetic relationship with other men. 

FamilyTreeDNA (Houstton, USA) and YSeq (Berlin, FRG) sell a range of STR test "panels" that contain from 12 to 111STR markers.  While a 12STR panel test can be used to predict a mans haplogroup it is unlikely to allow his relatedness to other chaps to be judged with any precision.  By using more STR marker panels the precision is improved.  By testing with the 111STR panel it is possible to differentiate genetic relationships between closely related men.  As a consequence STR comparison is particularly useful to assess relatedness within the genealogical time-frame equivalent to the last 500 years or so.   

From the comments above it follows that by using 111STR markers it should be possible to predict if a man will be derived for a relatively recent SNP like I-Y33765.  This is most effectively done if his STR haplotype is compared to the mode (the value most likely to be sampled) haplotype obtained from the STR results of men who have been confirmed as derived for I-Y33765 by SNP testing.  The following table gives the I-Y33765 mode STR values based on results from six men. 

References

 
Butrimas, A., (1992) Spigino Mezolito Kapai, Lietuvas Archeologia, 8, 4

GRC., (2007) The GRC is a collaboration between the The Wellcome Sanger Institute, the McDonnell Genome Institute at Washington University, the European Bioinformatics Institute and the The National Center for Biotechnology Information

Mathieson, I,Lazardis, I,Rohland N, et al., (2015) Genome-wide patterns of selection in 230 ancient Eurasians, Nature 528 499

Mittnik, A., Wang, C., Pfrengle, S. et al. (2018) The genetic prehistory of the Baltic Sea region. Nat Commun 9, 442 

1 comment:

Warlords, foederati, princes or pirates: Exploring some characteristics of the men involved in the star cluster expansion downstream of I-Y4252

There would seem to be something remarkable about the man who was the founder of the I-Y4252 haplogroup.  We can see this clearly from the e...