Click on image to enlarge
Friday, 25 August 2023
Sunday, 31 July 2022
FamilyTreeDNA "Discover" and TMRCA estimates for I-Y33765
In previous articles I have discussed Time to Most Recent Common Ancestor (TMRCA) estimates for the phylogenetically important mutations in our I-Y33765 clade and how we can calculate these time intervals using the mutation rate of SNP and STR Y-DNA genetic markers. Being able to calculate approximate TMRCA is obviously a great benefit when we are researching the supposed historical context of events involved in our genealogy.
One characteristic of the types of calculations we have used for obtaining TMRCA intervals is that these have relied upon either SNP-only or STR-only methods. It has been reported (Balanovsky, 2017) that while experimentally obtained SNP marker mutation rates largely overlap and produce usable TMRCA estimates for both genealogy and evolutionary studies, STR markers are more useful when constructing fine-scale phylogenies that are typically associated with genealogical pedigrees. However both types of marker have their own inherent problems. when they are used as the basis for TMRCA estimations.
First, when using SNP markers to estimate periods within the genealogical timescale it is important to take care to select only the very high confidence mutations. Unfortunately these may not always be available or easy to identify with confidence. Because genealogical timescales are relatively short, the omission or addition of just a single SNP can significantly alter estimates. Next, when using STR markers the major difficulty is presented by convergent and/or multiple mutations which can introduce error that again can be hard to identify with confidence, especially when a small number of individuals are being compared.
For these reasons, the confidence intervals provided for our estimates are poor and there is a consequent need for an alternative method to estimate TMRCA that would give improved accuracy. In 2021, a paper was published in Genes by Iain McDonald (McDonald, 2021) in which he presented a novel mathematical approach that combines probability calculations based on using SNP and STR markers with other probabilities derived from historical data and from ancient DNA to achieve more precise and accurate TMRCA intervals for genealogy.
Earlier this month (July 2022) FamilyTreeDNA announced the release of an online feature they call "Discover" that provides "information about the haplogroup from your Y-DNA test". The application is available to FTDNA customers and to others who simply need to register with that company. The tool allows users to input a Y-DNA haplogroup designation from which the application generates report pages which give a summary of relevant information, including geographic frequency, notable related individuals, migration routes and ancient DNA examples. In addition, a section called "scientific details" gives options that list the base variants associated with that haplogroup, show its position within the Y-DNA haplotree and, most importantly for our interests, provide an estimation of the TMRCA at various confidence levels. In presenting this latter feature (see Figure 1) the page rubric explains that it "is calculated based on SNP and STR test results from many present-day DNA testers" and "the state-of-the-art FamilyTreeDNA algorithm for inferring age estimates for the Y-DNA Haplotree. [was] Developed together with Iain McDonald."
It seems to me this description and the helpful credit provide an indirect but clear reference to the combined probability model Iain McDonald describes in detail in his paper mentioned above. As a result I think it worth briefly mentioning the significant advantages shown in McDonald's work now that we can all make use of the user-friendly version of his algorithm as it is provided by the FTDNA application.
In his paper McDonald describes the mathematical basis for a method which merges "the Y-SNP and Y-STR molecular clocks, and takes into account other available evidence (eg:, ancient DNA, proven paper genealogies, relatedness through autosomal DNA, etc)." He demonstrates his revised algorithm using four examples. In three scenarios he generates data which illustrate DNA ancestry either in colonial America, or in historical Scotland and Ireland, or medieval or prehistoric Europe and for the fourth model he uses real data from royal Stuart lineages. With each of these example data sets he illustrates how his combined method gives improvements in the precision and accuracy of the TMRCAs compared to either STR-only or SNP-only methods.
McDonald writes that "the most significant improvements in the precision of the TMRCAs come from the ability to combine both STR and SNP mutations into a single calculation" and he notes that in the future, improving the definition of STR and SNP mutation rates, offers the greatest likelihood for getting further benefits from his combined method over either STR-only or SNP-only TMRCA calculations.
Table 1: Comparison of TMRCA estimates for haplogroups within the I-Y33765 clade
At present within our clade we only have two haplogroups, I-Y33761 and I-BY198548, for which the dates are known from documentary sources. When we compare the "Discover" TMRCA estimates for both of these haplogroups with those obtained using our normal SNP-only methodologies (Table 1) there are improvements in the precision and accuracy achieved for both. It seems to me that this is the most obvious practical method by which to judge the new FTDNA algorithm and based on this result I consider the new methodolgy is definitely helpful.
In addition, the 95% confidence interval for the "Discover" estimates is significantly more constrained compared with the YFull SNP-only method. Lastly, the mean dates given by the "Discover" application are broadly similar to those we have obtained using our clade-specific SNP mutation rate that we calculated directly from the nineteen generation Nils Swensson (1631-1713) pedigree. Because of these several positive indicators I have updated our draft I-Y33765 chart (Figure 2) using the TMRCA dates highlighted in Table 1. As you can see the majority of these are taken from the FTDNA "Discover" application.
Figure 2: I-Y33765 draft chart, July 2022
(Click on images and table to enlarge)
References
Balanovsky, O (2017) Toward a consensus on SNP and STR mutation rates on the human Y-chromosome, Human Genetics, 136, 575-590
McDonald, I. (2021) Improved models of Coalescence Ages of Y-DNA Haplogroups, Genes, 12, 862
Saturday, 12 March 2022
I-Y33765 draft tree March 11th 2022
Here is our latest draft phylogenetic tree for I-Y33765.
Click on image to enlarge
This chart includes a tenth man derived (+) for the I-Y33765 mutation, Clements YS51041. He lives in the Midwestern region of the United States and has recently tested the SNPs Y33765 and BZ4354 at YSeq. His result was derived (+) for Y33765 and ancestral (-) for the BZ4354 test. This is consistent with his documented direct male ancestry which is from the parishes of Marksbury and Compton Dando in the lower Chew Valley, Somerset. His ancestor, George William Clements (1809-1891) was born in the village of Draycott, Somerset and sailed with his family to New York, USA in 1842. This is a further example of the participation of I-Y33765 men in the nineteenth century migrations from Europe.
Genealogy for Clements YS51041 shows that he is an eighth cousin once removed with Clements B742594 (YS32054) and they consequently share a 19 generation pedigree. Their earliest known direct male ancestor is Francis Clement (circa1630-1708) who was Parish Clerk of Compton Dando during the closing years of the Commonwealth. This pedigree implies that their Y-chromosomes are separated by 647 years, with an average generation time of 34.05y. This well documented pedigree is of similar duration to the Jacobsson IN70815 - Hallberg YF80422 pedigree on the Swedish arm of I-Y33765 which I used in 2020 to estimate a clade specific mutation rate. I hope that shortly I may be able to repeat this process using the new Clements pedigree and so obtain a comparison between SNP mutation rates in the English and Swedish populations of I-Y33765 men.
Sunday, 8 August 2021
A new branch downstream of I-Y33767 defined by BZ4354
Recently, a Big Y-700 test has been completed for Clements B742594. This
is the fifth set of next generation sequencing (NGS) results we now have for men on the English arm of the I-Y33765 clade; the previous tests are two Big Y-500 tests (Clement 236748 & 282009) and two Big Y-700 tests (Clement 236748 & IN82043).
Before the completion of the latest analysis, genetic testing has established that the English arm divides into several lineages downstream from I-Y33767, possibly at some time during the later decades of the sixteenth century. Further, using the documented genealogies of the five men known to be derived (+) for Y33765 it seems probable this divergence happened between Clement/s male lines that were living in the valley of the River Chew, North Somerset.
As I discussed in my June blog, a comparison of 111STR results for Clements B742594 and three Clement men (236748, 282009 & IN82043) gave a hint, based on repeat values at two markers DYS481 and DYS717, that he was more closely related to Clement 236748 & Clement 282009. All three men share a 26 repeat motif at DYS481and they also have a 19 repeat motif at DYS717. In contrast, Clement IN82043 has 25 repeats at DYS481 and 20 repeats at DYS717. So, using these STR comparisons, in the June update I speculated that downstream from I-Y33767 the English branch divided, with Clement 236748 & 282009 with Clements B742594 in one group and Clement IN82043 in the other. As we will see my speculation was correct about there being a branch below I-Y33765 but entirely wrong about the way in which these four men are related.
This error became clear immediately we had the complete Clements B742594 Big Y-700 result. This shows that Mr Clements is ancestral for three SNPs for which each of the Clement men (236748, 282009 & IN82043) are derived. These mutations are BZ4354, FT314945 & FT324244 and we can now see that they divide the Chew Valley Clement/s into two distinct branches downstream of I-Y33767. Hence, because of these definitive "next generation sequencing" (NGS) data from Mr Clements Big Y test we can reject my mistaken conclusions based on STR comparisons and instead be confident of this division defined by the BZ4354 SNP. This relationship is illustrated in our new I-Y33765 chart shown below.
Rather than removing the previous blog (June, 2021) that contains my incorrect assessment based on the STR comparisons, it seems to me that, together with this update, it illustrates the pitfalls of relying solely on STR markers and once more demonstrates that only SNP mutations provide a "gold standard" genetic assay for genealogy.
Click on image to enlarge
Monday, 21 June 2021
I-Y33765 draft tree showing Short Tandem Repeat (STR) marker mutations, June 18th 2021
Previously, the charts I have drawn to illustrate our developing understanding of phylogeny for the I-Y33765 clade have been based on Single Nucleotide Polymorphism (SNPs) results from FTDNA Big Y-700 next generation testing. In this draft however I have sketched a tree which shows the estimated positions of Short Tandem Repeat (STR) marker mutations downstream from Y33765. My interpretation is based on comparison of the results of eight men who have each tested 111 STR markers at FTDNA.
Those mutations shown in blue type on the chart are "back" mutations in which the number of repeat motifs at that marker has been reduced compared to the upstream value shown in black, while "forward" mutations, in which the number of repeat motifs has increased, are shown with red characters. The chart demonstrates how STR markers mutate randomly over time with both "forward" and "back" mutations happening at an apparently similar frequency on each of the lineages within our clade.
Most of the observed mutations are, as might be expected, in the fastest mutating STR markers but four of these changes have occurred in markers with the slowest rate of mutation. All these slow marker mutations have occurred at some point during the last ten to fifteen generations, so within the conventional genealogical time-frame. One of the English lineages, Clement IN82043, at marker DYS717 shows an increase from 19 to 20 repeats. On the Swedish arm of the clade, Eklund IN78306 has a "forward" mutation from 18 to 19 repeats at DYS587 and a "back" mutation from 14 to 13 repeats at DYS497. Lastly the Swedish, Jacobsson IN70815 lineage has a "back" mutation from 12 to 11 repeats at marker DY568. Again, we can interpret these changes as demonstrating the completely random nature of alterations in Y-chromosome STRs.
The most significant relational change in this latest version of the I-Y33765 tree concerns the expansion of the English branch downstream from I-Y33767. Comparison of the 111STR results for Clements B742594 (in previous iterations of the I-Y33765 tree shown as YS32054) with those for the other English men has shown that he is most closely related to Clement 236748 and Clement 282009. This finding prompted a reconsideration of the known documented genealogies for these three men and, as a result of this, a putative connection has been found between them with their most recent common ancestor (MRCA), George Clement, who was baptized at St Mary the Virgin, Compton Dando, Somerset, 1st December, 1678. This finding would seem to indicate that the genealogies of all the presently known instances of I-Y33765 in England can now be shown to originate from an area of north-east Somerset close to the parish of Clutton. Previously the earliest known Clement ancestry for 236748 and 282009 was in south Gloucestershire.
Regular readers will recall that I have several times discussed my feeling that a plausible explanation for the localization of an originally Scandinavian I-Y33765 male lineage in north Somerset can be proposed by linking its patriarch to north Somerset manors that, like Clutton, were owned by men of Scandinavian descent during the two generations that preceded the Norman Conquest. It seems to me this latest redefinition of genetic relationships on the English arm of I-Y33765 supports this hypothesis. If you are interested in more discussion on this theme then I suggest you may want to look at these earlier blogs published 24 September 2020, 21 & 23 October 2020 and 11 April, 2021.
Click on image to enlarge
In the above post, my conclusions about the genetic relationships between the five English Clement men, that I had based on a comparison of their Short Tandem Repeat (STR) 111 marker test results are incorrect. Consequently, in the above chart the arrangement of branches downstream of Y33767 is wrong and should be ignored. My mistake has been confirmed by the "next generation sequencing" results from the BigY-700 analysis completed for Clements B742594 in July 2021 (for a full update please see my blog published 8 August 2021).
It seems to me that it is probably helpful to leave this incorrect post on-line because it illustrates the folly of relying solely on STR markers and nicely demonstrates that only Single Nucleotide Polymorphism (SNP) mutations can be considered a "gold standard" marker for genetic genealogy.
JAC, 8th August, 2021
Thursday, 18 February 2021
I-Y33765 draft tree February 17th 2021
This latest version of our I-Y33765 phylogenetic tree includes the recently completed 111STR upgrade analysis for Dahlberg IN81271.
From his previous 37STR result it had seemed likely that Dahlberg and Hallberg YF80422 may have shared a common direct-line male ancestor within the last 200-300y. However, based on his new upgrade result it is now clear that this was an incorrect assumption. It now seems more probable, based on STR comparisons between all four Swedish men for whom we have confirmed Y33765 derived (+) status, that his lineage, and those of Hallberg YF80422 and Jacobsson IN70815, diverge at the level of BY198548. This relationship is shown on our revised chart but, as yet, Dahlberg's status for the BY198548 SNP has not been confirmed by PCR.
Click on chart to enlarge
Tuesday, 2 February 2021
I-Y33765 draft tree January 14th 2021
The style of our latest I-Y33765 phylogenetic tree has changed from that used in previous drafts. The intention is that this new format should make it easy to see some useful characteristics of the genetic markers directly from the chart.
For example the names of single nucleotide polymorphism (SNP) markers are now shown in either red or green; those in red are located within the CombBed (non-recombining) regions of the Y-chromosome and consequently we can have high confidence that these SNPs are true mutations suitable for use in age estimations. On the other hand SNP names shown in green or purple are in less stable areas of the Y-chromosome and may consequently be less reliable and possibly false positives. Using the high confidence SNPs to calculate the time to most recent common ancestor (TMRCA) for a particular mutation, we can total the number of red SNPs shown on each downstream branch and multiply that number by the clade-specific mutation rate as estimated using the Swensson pedigree (see post January 2021).

The orange dash symbols on a portion of the Swedish arm of the chart, linking Jacobsson IN70815 and Hallberg YF80422, shows the position of the 19 generation Swensson pedigree used to calculate our clade-specific SNP mutation rate.
Click on charts to enlarge
Wednesday, 23 December 2020
I-Y33765 draft tree December 23rd 2020
This is our latest draft phylogenetic tree for I-Y33765.
The
chart includes information from a FTDNA Big Y-700 test that has recently been completed on the Swedish arm of I-Y33765. This is the seventh Big Y test to have been completed within the I-Y33765 clade.This analysis for Mr Hallberg has identified five novel variants on his branch downstream from FT250135 (8564554, 8595182, 8603627, 15419322 and 17156586). It seems possible that Dahlberg IN81271 may be derived for one or more of these novel SNPs. Two earlier Big Y-700 tests on the Swedish arm, Eklund IN78306 and Jacobsson IN70815, have five and four novel variants respectively.
On the English arm, the FTDNA Big Y-700 test for Clement IN82043, which was completed in October, showed four novel variants. One of these was at the position of a SNP that had been named Z22427 in 2015. FTDNA initially identified the Clement polymorphism as Z22427 and this is how it was shown on the previous I-Y33765 draft tree dated November 7th, 2020 . However, while the Z22427 mutation is G > T, the Clement variant is G > A, and as a consequence, FTDNA have subsequently named this variant, FT374048. Following my suggestion Yseq have designed primers for FT374048 and for another of the IN82043 novel variants, which has also been named, this time by YFull, as Y223449. Clements YS32054 is currently being tested for both these SNPs at YSeq.
Click on chart to enlarge
Thursday, 12 November 2020
I-Y33765 draft tree November 12th 2020
This is our latest draft phylogenetic tree for I-Y33765.
The
chart includes 6 new SNPs common to the English arm of I-Y33765. These have recently been identified and named by FTDNA (FT310619; FT314713; FT314945; FT321056; FT324244) and YFull (Y207591) from comparison of Big Y-700 data for FTDNA kit numbers 236748 and IN82043. This means that at present there are 19 phylogenetically significant mutations on the English arm and this total is approximately twice the number so far identified on the Swedish arm. At present the reason for this difference in mutation frequency is unclear but I intend to explore some possible explanations in a future article.
FTDNA consider that one of their new SNPs, FT310619, is the confirmed haplogroup for Clement IN82043. Prior to identifying their five new SNPs they had used Y33767 as the confirmed haplogroup for Clement IN82043 and this is shown on our latest I-Y33765 tree (below). As FT310619 and Y33767 are phyloequivalent either designation is valid at present. In this connection Bernie Cullen (I2a Project activity feed, 10 November 2020) has observed "These SNPs are equivalent, maybe FTDNA prefers to use FT series SNPs to name branches." It seems to me that as Y33767 was identified and named by YFull in 2017, but FTDNA has only just named FT310619, the earlier YFull SNP designation is more appropriate for the branch name. From his Big Y-700 analysis Clement IN82043 also has one private variant (Hg38, 15198490) and this has now been named Y223449 by YFull.
Within the past few days Yseq have tested Clements YS32054 for the Y33767 SNP and have confirmed his status is derived (+). This finding supports my hypothesis the Clement/s lineages on the English arm share a most recent common ancestor who was living in north Somerset, perhaps in the Chew valley, towards the close of the fifteenth century.
Click on chart to enlarge
Sunday, 1 November 2020
I-Y33765 draft tree November 1st 2020
This is our latest draft phylogenetic tree for I-Y33765.
The chart includes information for SNPs following the completion of a Big Y-700 analysis for Clement IN82043 and confirmation for Clement YS32045 of his derived status for Y33765 following a YSeq assay for that SNP.
On the Swedish, FT250135, arm of I-Y33765 Mr Hallberg has recently ordered a FTDNA Big Y-700 analysis. This is the third Big Y-700 test on this branch and will be helpful in improving age estimates for the divergence between our four Swedish lineages which are, at present, largely based on STR results. Also, on the English, Y33767 arm of I-Y33765 we intend shortly to test Clements YS32054 for the Y33767 SNP at YSeq. At present we speculate that his lineage (as implied by the chart) has also diverged at Y33767.
The dates included are based either on the YFull age estimation methodology or on FTDNA TiP estimates or on known dates from documented genealogy. The estimated dates are only a very approximate guide to the age of branches and are most likely to have significant errors. The dates of branches will probably be revised in subsequent drafts.
Click on chart to enlarge
Thursday, 8 October 2020
I-Y33765 draft tree October 2020
This is the latest draft tree for I-Y33765. Some already identified SNPs on the English branch cannot be placed correctly at present until we have the result of the ongoing FTDNA Big Y-700 test for Clement IN82043. Clement YS32045 is also awaiting a test result for Y33765 from YSeq and at present is only predicted as derived on the basis of his documented genealogy.
The STR genetic distances (GD) shown are correct across the entire chart based on the
six 111STR results we have. Dahlberg IN81271 has tested 37STR but his position in the chart is
correct (he is confirmed derived for Y33765 at YSeq). The more 111STR results that become available the more confident
we can be in the GD relationships but, because STR markers can "back mutate", anomalies may become apparent due to the "convergence" artifact. So far that does not seem to be an issue.
The dates included are based on YFull or FTDNA TiP estimates or on known dates from documented genealogy. The estimated dates are only a very approximate guide to the age of branches and are most likely to have large errors. The dates of branches will likely be revised in subsequent drafts; again as more SNP and STR results are added on descendant branches the reliability of shared dates may be improved.
Click on chart to enlarge.
Tuesday, 22 September 2020
The identification and phylogenetic position of I-Y33765
In 2017, YFull.com, the Y-chromosome sequence identification service, identified and named the
I-Y33765 SNP when comparing the BAM files of FTDNA Big Y-500 test results for two men who shared the Clement surname and had earliest known
direct male ancestry during the eighteenth century in the parish of Compton Greenfield, south
Gloucestershire, England. At that time the mutation was one of ten novel SNPs identified in these men downstream of I-Y4252.
In February 2020, the same mutation was found in an FTDNA Big Y-700 test taken by a man named Jacobsson with seventeenth century direct male ancestry in Tjust, Småland, Sweden. Subsequently two other men named Eklund and Dahlberg with seventeenth and eighteenth century ancestry in Tjust and Kinda, Småland, Sweden have been confirmed with the marker (see map). From comparison of the English and Swedish results, and using their estimated Y-chromosome mutation rate, YFull calculate that Y33765 formed about 670AD. At present it seems probable that this happened somewhere in southern Scandinavia possibly in Småland, south east Sweden, and perhaps in the hundred of Tjust.
Warlords, foederati, princes or pirates: Exploring some characteristics of the men involved in the star cluster expansion downstream of I-Y4252
There would seem to be something remarkable about the man who was the founder of the I-Y4252 haplogroup. We can see this clearly from the e...

-
If you are male and if your family name is a variant either of Clement, or of Eklund, or perhaps Dahlberg or Hallberg, then it is possible y...
-
There would seem to be something remarkable about the man who was the founder of the I-Y4252 haplogroup. We can see this clearly from the e...
-
As at January 2022 FTDNA's Block Tree contained twenty branches immediately downstream of I-Y4252. Similarly the YFull Tree v.9.05 cont...