A Big Y-700 DNA Experiment

Have you received your Y-DNA test results but don’t feel like you learned as much as you thought you would? This is how I felt, too, until I immersed myself into the scientific details of FamilyTreeDNA’s Big Y-DNA test and designed an “experiment” to refine my test results.

The experiment is replicable and can help you maximize the value from this advanced genealogical research tool. In fact, it just might help you determine how you’re related to one of your mystery Y-DNA matches. The experiment worked for me.

Before I describe the experiment, allow me to briefly explain the most important Big Y-700 concepts. I promise not to get too technical, but interested readers are encouraged to read my earlier blog post on Y-DNA tests or visit FamilyTreeDNA or the International Society of Genetic Genealogy for greater depth.

Big Y-700 Test
For those new to Y-DNA, FamilyTreeDNA’s Big Y-700 is their most advanced test tracing your paternal ancestry. It tests 700 markers on the Y chromosome, which are passed down from father to son, and so only men can take the test. The science behind it is quite rich, but I think the most important concepts to grasp are 1) mutations, 2) haplogroups, and 3) age estimates to the time to most recent common ancestor (TMRCA).

Mutations
Y-DNA is made up of genetic code typically abbreviated as A, C, T, and G.[1] With each generation, Y-DNA is copied from father to son. However, errors may occur during copying where, for example, an A may be copied as a T. This is called a mutation, which does not have any medical implications, but are passed down from father to son. Scientists have determined the original values for each Big Y-700 marker tested enabling paternal lines to be traced by following mutated marker values backwards through time to its original value.

Haplogroups
Each mutation creates a new branch on the paternal family tree. When two or more men share the same set of mutations across all tested markers, we say they share a common paternal ancestor, which is represented in the paternal family tree as a haplogroup.[2] Haplogroups are given a unique alphanumeric identifier, such as I-Y106972, which is my haplogroup.

Age Estimates
Mutations are important because they permit scientists to predict how far back in time the haplogroup, or branch in the paternal family tree, was created. Mutations occur infrequently, but scientists have estimated Y-DNA mutation rates permitting age estimates for the time to the most recent common ancestor (TMRCA).[3] Mutation rates vary across the paternal family tree,[4] but scientists estimate a mutation may occur about every 83 years.[5] In other words and for example, the accumulated mutations for an individual tester may suggest that their most recent common ancestor with others in their haplogroup was born about 1750. The calculations are more complex than this, but it’s a quick back of the envelope calculation.

The Experiment
Background Information and Study Motivation
Like most research in genetic genealogy, it seems to start with a mystery match. Analysis of shared matches within my autosomal Ancestry DNA results identified an unlinked family cluster headed by William Wilson (1827-1896). An unlinked family cluster is a large group of matches who all descend from a single ancestor or couple but for whom you are unable to establish a genetic relationship.[6]

Based on the shared matches among the unlinked family cluster, which ranged from 15 to 62 cM in length, the genetic connection with the cluster appeared to be on my Wilson line. The descendancy of my Wilson family tree is robust starting with the immigrant ancestor who was born about 1715 and coming forward with all his lines. However, I could not reliably place the William Wilson unlinked family cluster within my tree.

Suspecting our Wilson line is where we connect, I used targeted DNA testing and found a male Wilson descendant from the William Wilson (1827-1896) unlinked family cluster willing to take the Big Y-700 test. Results were a match and indicated he and I were part of the same haplogroup (I-Y106972) with a TMRCA of 1814 (see the Figure 1 below, which is taken from FamilyTreeDNA’s scientific details section of the Discover™ Haplogroup Reports tool).

FamilyTreeDNA scientific details initial results 1814 TMCRA

The Big Y-700 results also indicated that I have one mutation that my new cousin does not, and my cousin has one mutation that I do not. Because no one else who has tested shares these mutations on this branch of the paternal family tree, FamilyTreeDNA denotes these mutations as private variants. This fact becomes important momentarily.

Based on what I knew about our respective genealogies, I couldn’t help but think that the 1814 TMRCA was misleading despite it being based on sound scientific calculations. While I understand that the 1814 TMRCA is just a mean and that the actual birth year for our common ancestor could be anywhere from 1658 to 1917 (see Figure 1), this mean perplexed me because William Wilson (1827-1896) could not be a son for any of my known direct paternal ancestors going back to 1715 because each already have a son named William Wilson that is accounted for and confirmed in other records as separate individuals in both time and place. Therefore, it is probable that William Wilson (1827-1896) is a nephew or cousin to one of my direct paternal ancestors.

So, what might explain the 1814 TMRCA? I believe the answer rests in our private variants. Because my new cousin and I each have a private variant, learning the generation where the mutation occurred might help me figure out where William Wilson (1827-1896) fits into my paternal family tree. To help me visualize our possible relationship, I created a paternal family tree identifying a few options for where he might fit in (see Figure 2 below). In the figure, I am represented by “A” and my new cousin as “B”.

Relationship between myself and the mystery match

The Actual Experiment
The motivation for the experiment is guided by two principles. First, I need to find other Wilson testers who share the same private variants (mutations) as myself and my new cousin. The identification of others who share these variants would create new haplogroups whose placement on the paternal family tree would be more recent than the current haplogroup thereby permitting a more refined estimation of where the unlinked family cluster fits within my tree. Second, I remembered from statistics class that as the size of a sample grows, its mean becomes closer to the population’s average. Therefore, getting others to test who are suspected to be somewhat closely related on the Wilson line should affect the TMRCA calculations and refine when the common ancestor for the haplogroup was likely born.

The experimental design involved using targeted testing in what I call a laddering approach. That is, finding another male Wilson descendant to take the Big Y-700 test from my green generations 1, 2, and 3 (see Figure 2 above) whose common paternal ancestor with me is only from their respective generation. These older generations are likely where the new cousin fits in (see Figure 2). Testing starts with the most recent generation (my green generation 3) and progresses up the ladder one test and rung at a time so mutations can be studied and determined if it is required to continue testing up the ladder to an older generation. What I’m hoping to discover is whether the new tester shares the same private variant as me. If he does, then our new haplogroup is probably more recent. If he doesn’t, then our common ancestor represented by our current haplogroup is probably further back in time.

I also needed an additional Big Y tester for the unlinked family cluster’s line. More specifically, I need another descendant of William Wilson (1827-1896) but through one of his other sons, i.e., targeted tester D as denoted in Figure 3 below. The hope here is that tester D and my new cousin, tester B, share the same private variant mutation. If they do, it will indicate that their newly formed haplogroup is more recent than our collective and current haplogroup of I-Y106972. Figure 3 summarizes the initial testing plan.

Big Y Experimental Design

 

Experimental Results
The first Big Y results to post were for the targeted tester C. The common ancestor he and I share is John Wilson (1784-1840) at generation 3. Tester C does not share my private variant, but he does possess two other private variants that I do not. Nor does tester C share any mutations with tester B. However, all three of us still share the same haplogroup of I-Y106972. As you can see below in Figure 4, Big Y results suggest our common ancestor was now likely born around 1748.

FamilyTreeDNA scientific details initial results 1748 TMCRA

The addition of tester C’s results pushed the mean birth year for the TMRCA backwards from 1814 as previously shown in Figure 1 to 1748 because targeted tester C has two mutations while I (tester A) and tester B each have only one mutation. Although tester C and I share the same common ancestor, who was born in 1784, the TMRCA calculations suggests our common ancestor for the haplogroup is further back in time because, on average, mutations occur about every 83 years, and tester C has two mutations. Indeed, tester B and I each only had one mutation, which made it initially appear that our common ancestor was more recent, i.e., 1814. (Note: 83 years is a rough approximation genealogists use. FamilyTreeDNA calculations are much more robust,[7] but a detailed discussion of FamilyTreeDNA’s calculations is outside the scope of this post.)

The next Big Y results to post were for the targeted tester D, whose common ancestor with tester B is William Wilson (1827-1896). Tester B and D have one mutation that they both share, which enabled this mutation to form a new haplogroup of I-Y98226. However, tester D has one additional mutation that tester B does not. The additional mutation possessed by tester D suggests the haplogroup they share (I-Y98226) is probably a bit further back in time than initially reported. As such, FamilyTreeDNA now suggests the new TMRCA for their new haplogroup is 1809 compared to 1814 (see Figure 5).

FamilyTreeDNA scientific details initial results 1809 TMCRA

Yet, because testers B and D also share membership in haplogroup I-Y106972 along with tester C and myself (A), then testers B and D’s new haplogroup is more recent and falls underneath I-Y106972 as shown in Figure 6 in FamilyTreeDNA’s time tree[8]. As such, I-Y98226 is considered a child of haplogroup I-Y106972. The addition of tester D’s results also changed the TMRCA for I-Y106972 from 1748 to 1743, because tester D has an additional private variant mutation that tester B does not.

 

Discussion
The initial results, which included just me (tester A) and tester B, were misleading. Because both of us only had one mutation that neither of us shared, the mean TMRCA was suggested to be closer than it actually was (1814 initially and 1743 now). The results of tester C pushed the mean TMRCA back to 1748 because he had two additional mutations that neither I nor the mystery match had. When all four of us completed our testing, the TMRCA pushed a little further back to 1743.

The results of this “experiment” also demonstrated that the haplogroup for testers B and D is more recent than my haplogroup. As previously shown, their haplogroup (I-Y98226) is considered a child of my haplogroup (I-Y106972). As such, all four of us are members of I-Y106972, but myself (A) and tester C are not members of I-Y98226.

I learned something else that is a bit harder to grasp and perhaps even harder to convey. While I-Y98226 and I-Y106972 each represent a common ancestor, I still don’t know which of our ancestors are represented by the haplogroups. It is true that the common ancestor for testers B and D is William Wilson (1827-1896) and the common ancestor for testers A and C is John Wilson (1784-1840), but William and John may not be the originator of their respective haplogroups. For example, William Wilson (1827-1896) is certainly a member of I-Y98226, but his father or even grandfather may have been the first in his line to receive the mutation resulting in I-Y98226, which was then passed down to William and thus to testers B and D.

Ultimately, which ancestor was the first to have the mutation may be purely academic. While interesting, I don’t really need to know this to understand where the William Wilson (1827-1896) unlinked family cluster fits into my tree. Based on the refined results, it seems quite probable that William Wilson’s common ancestor with me is either John Wilson (1784-1840) or John’s father, William Wilson (~1758-1804) (see Figure2). It is probably not necessary to test up the ladder to the next generational rung. I have other tools at my disposal that, when combined with the refined Big Y results, can help me identify the unlinked cluster’s exact connection into my family tree. In particular, reviewing autosomal DNA matches from multiple descendants of William Wilson (1827-1896) may help locate genetic networks that contain ancestors associated with the wives of John Wilson (1784-1840) and/or William Wilson (~1758-1804). Their presence or absence can signal at which generation William Wilson (1827-1896) fits. This will be the subject of a future blog post series that I will call, “Anatomy of a Mystery DNA Match”, where I will reveal exactly how William Wilson (1827-1896) fits into my tree.

Summary
Through this “experiment”, I hope I have been successful in demystifying the Big Y-700 test by illustrating its three most important concepts (i.e., mutations, haplogroups, and age estimates) and by demonstrating how targeted testing can be used to refine its results. If you are fortunate enough to have good Big Y-DNA matches, especially those who you are unable to determine the common ancestor, I hope I have inspired you to ask other known paternal-line cousins who are perhaps third cousins or greater to take the Big Y-700 test. Doing so might not only transform private variants into haplogroups but also refine the TMRCA estimates with existing matches and help you learn more about your paternal ancestry.


Subscribe to Blog Posts

Sources
[1] FamilyTreeDNA (n.d.), Y-SNP Testing. Accessed 26 January 2024 at https://help.familytreedna.com/hc/en-us/articles/4414479800463-Y-SNP-Testing#your-ancestral-origins-0-0.
[2] Rowe-Schurwanz, Katie (2023, November 22), Big Y Lifetime Analysis: The Myth of the Manual Review. FamilyTreeDNA Blog. Accessed 26 January 2024 at https://blog.familytreedna.com/big-y-manual-review-lifetime-analysis/.
[3] FamilyTreeDNA (2022, September 9), FamilyTreeDNA Enhances TMRCA Estimates for Improved Family History Research. FamilyTreeDNA Blog. Accessed 26 January 2024 at https://blog.familytreedna.com/tmrca-age-estimates-update/.
[4] IBID.
[5] University of Strathclyde Glasgow (n.d.), Genetic Genealogy Research: SNP Dating. Accessed 26 January 2024 at https://www.strath.ac.uk/studywithus/centreforlifelonglearning/genealogy/geneticgenealogyresearch/snpdating/.
[6] Bettinger, Blaine (2023), The Growing Phenomenon of Unlinked Family Cluster. Accessed 17 October 2023 at https://thegeneticgenealogist.com/2023/03/16/the-growing-phenomenon-of-the-unlinked-family-cluster/.
[7] FamilyTreeDNA (2022, September 9), FamilyTreeDNA Enhances TMRCA Estimates for Improved Family History Research. FamilyTreeDNA Blog. Accessed 26 January 2024 at https://blog.familytreedna.com/tmrca-age-estimates-update/.
[8] A third Y-DNA kit is visible in Figure 6 for the I-Y106972 haplogroup. This individual has only taken the Big Y-500 test and is therefore not fully comparable with those who have taken a Big Y-700 test. However, he does descend from John Wilson (1784-1840) like testers A and C but through a different son than both testers A and C.


 

2 thoughts on “A Big Y-700 DNA Experiment”

Leave a Comment