Using Y-DNA to Analyze Genetic Networks and Unlinked Family Clusters

Imagine finding a DNA match who has the surname you’re researching within their family tree, and their ancestor is a known descendant of your 5x great grandfather, who you have yet to identify. If only you knew this was the match you’ve been waiting for to break down that stubborn brick wall. Well, this is not a genealogical dream – it’s a use case for Y-DNA used in conjunction with autosomal DNA.

The ability to advance our genealogy comes from the combined use of autosomal and Y DNA. While most of us are comfortable using autosomal DNA, many are either unsure how to use Y-DNA with autosomal DNA or are intimidated by interpreting Y-DNA results.

In this blog post, I discuss how Y-DNA can be used to help analyze a group of shared matches (genetic networks) and the unlinked family clusters found within them. I also provide a current case study from my Hill genetic network research demonstrating the synergetic value of autosomal and Y DNA.

Quick Definitions for Autosomal and Y DNA

If you are unfamiliar, autosomal DNA refers to the DNA equally inherited from both parents, and companies like Ancestry, MyHeritage, and FamilyTreeDNA provide tests permitting genealogists to access match lists to trace ancestry and identify relatives. On the other hand, Y-DNA is passed down only from father to son enabling genealogists to determine paternal ancestry. FamilyTreeDNA is the best company offering affordable direct-to-consumer Y-DNA testing for genetic genealogy.

Y-DNA and Genetic Networks

Genetic networks are a group of shared autosomal DNA matches who have a common ancestor. They are created when you find an interesting match and click on the shared match or in-common-with button within a DNA testing website. Genetic networks are frequently used to discover the identity of an unknown ancestor. The intent is to analyze the family trees of the shared matches for clues that lead to documentary or other DNA evidence to resolve brick wall ancestors. These clues often present themselves as unlinked family clusters.

An unlinked family cluster is a large group of matches who all descend from a single ancestor but for whom you are unable to establish a genetic relationship. Sometimes the common ancestor of the unlinked family cluster shares the same surname as the ancestor you’re researching. This is where Y-DNA can help.

When you find an unlinked family cluster who shares your surname of interest and whose direct paternal descendant is a known Y-DNA match, you can use this information, along with other observations gathered from your autosomal DNA analysis, to theorize the lineage of your ancestral line further back in time. This is the genealogical gold at the end of an ancestral rainbow!

What Y-DNA Tells You

While FamilyTreeDNA provides Y-DNA testing for 37 and 111 markers, this discussion relies on the reports associated with the Big Y-700 test. For genetic network and unlinked family cluster research, the Big Y offers the following helpful reports:

  • Scientific Details
  • Match Time Tree
  • Match Lists

For those interested in a more visual description of the Big Y-700 DNA test, I have several YouTube learning modules covering a variety of topics from basic report tutorials to advanced analyses.

Scientific Details

This report provides an estimated birth year for the most recent common ancestor (MRCA) among your Y-DNA matches, which is calculated based on the number of Y-DNA marker mutations between matches. At the risk of becoming too technical, MRCA estimates are called haplogroups, which may help you better understand what you see on the reports. The image below is taken from the Big Y-700 results for my Parker ancestry.

Scientific details report from FamilyTreeDNA Big Y-700 DNA test for Parker ancestry

Confidence intervals are also provided here enabling researchers to better interpret their results, but I find the provided mean is fairly accurate especially when there are more than two testers for an estimated MRCA (haplogroup). In my own research, I rely on the mean and the dark blue band surrounding the mean, which is the 68% confidence interval.

For those unfamiliar with confidence intervals, imagine the mean estimated birth year for an MRCA is not 1706 as stated on the above graphic but some other birth year. Then, a 68% confidence interval means that 32% of the time the actual birth year for the MRCA – compared to the estimated birth year mean of 1706 – would be outside the confidence interval. In other words, there is a 32% chance of being incorrect.

You might be inclined to believe the 99% confidence interval is better given its lower error rate. However, the interval is too wide, and not very useful for most genealogists. In the above image, the 99% confidence interval is between 1293 and 1925 (a 632-year span). If you want to minimize the chances of being incorrect, it’s a great choice. However, for genealogists, it’s not that helpful because it can’t help us narrow down the generation where a match likely fits into our family tree.

As stated previously, the more cousins you test within your haplogroup or within the parent haplogroup to which your haplogroup descends, the more accurate the mean estimated birth year for your MRCA becomes and the narrower the confidence bands become.

Match Time Tree

This report is perhaps the most valuable as it places your Big Y-700 matches on an easy-to-read timeline estimating when each match likely connects into your family tree. Each icon on the timeline represents the mean estimated birth year for the MRCA (haplogroup) between you and your matches. Hovering over an MRCA haplogroup icon provides the estimated birth year for the MRCA and corresponds to the mean in the Scientific Details report. The image below is for my Parker ancestry.

Match time tree report from FamilyTreeDNA Big Y-700 DNA test for Parker ancestry

For genetic network research, the Match Time Tree can help you visually decide at which generation an unlinked family cluster of the same surname likely connects into your family tree based on the estimated MRCA – assuming you have a Big Y match for the cluster.

Keep in mind that the mean MRCA is an estimate, and actual placement in your family tree can vary. The more matches you have contributing to an MRCA haplogroup estimate, the more accurate the estimate becomes.

Also, note that some of your Big Y matches opt out for their name and thus their match appearing in the Match Time Tree. So, the picture here may not be complete. Therefore, you can view the simple Time Tree report, which is the same graph for your matches but without the matches’ names. All matches appear here, but they are not named for easier identification. Their identity can be determined by taking the extra step of reviewing the Match List and correlating haplogroup information.

Match Lists

The Big Y match list is not as directly important for genetic network and unlinked family cluster research if you use the Match Time Tree (or simple Time Tree) because Big Y matches are placed in the time tree. While there is other important information within the Match List, it is not immediately helpful for our discussion.

Nevertheless, the Y-DNA 111 match list (and to a lesser extent the Y-67 match list) can be very helpful for genetic network research especially for those who are close matches (fewer genetic distance steps) and have taken a Y-67 or Y-111 test but not a Big Y test (see the image below from my Parker ancestry). These match lists can be used to identify other potential cousins who can upgrade to the Big Y where the benefits derived from its reports can aid the analysis of unlinked family clusters. In the Y-111 list, I concentrate on those with 1-3 genetic distance although I’ve had a confirmed third cousin have a genetic distance of 6.

Match List for Parker Ancestry from Y-DNA test results

Within the match lists, FamilyTreeDNA indicates the highest-level Y-DNA test taken. I’ve had great success contacting individuals here to upgrade to the Big Y sometimes on their own accord while other times I sponsor the cost. This YouTube learning module directly discusses how to use the match lists in this way.

Case Study: Hill Genetic Network

To visualize and fully appreciate how Y-DNA can complement autosomal DNA analysis, I use my current research efforts on my Hill ancestry.

The Research Plan

Nearly a year ago, I embarked on a research project to identify the parents of my 4x great grandfather, William Hill (1775-1836). Using the EGGOS Search Strategy, I identified a group of shared autosomal DNA matches aligned with my Hill ancestry – a Hill genetic network.

At the same time that I identified the Hill genetic network using autosomal DNA, I also developed a strategy to for using Y-DNA. Because my surname is not Hill, I found a male Hill cousin who was willing to participate in the Y-DNA study. He was a direct male descendant of William Hill (1775-1836), but a fourth cousin once removed from me.

Because of an error processing my cousin’s Y-DNA sample, test results at FamilyTreeDNA took six months, which is about twice as long as normal processing. My cousin had no close Big Y matches and the ones he did have, did not carry the Hill surname. To make matters worse, the MRCA estimate for these other non-Hill matches were around 1200 AD. Not very helpful when you consider autosomal DNA tests reliably go back only about six to eight generations.

Not ready to give up, I consulted my cousin’s Y-67 and Y-111 match lists. My cousin had one Y-111 Hill match with a genetic distance of two, which suggests we might be related in genealogical time. The match’s Hill family was from Garrard County, Kentucky. I contacted the match, and he agreed to upgrade to the Big Y. The results took about four months to fully process.

The Results

An hour before I would be notified via email that the Big Y results for the Y-111 match upgrade had posted online, I had an amazing discovery. Using Viewed Match Switching, which involves selecting another match within the list of shared matches for the Hill genetic network and then reviewing its shared match list, I discovered a match I had not seen before – a match who had Hill ancestry from Garrard County, Kentucky!

Over the course of an hour, I found 16 matches from four child lines from the progenitor of the Garrard County Hills, and all were associated with the previously discovered Hill genetic networks I had been working with over the past year. Using the principles of DNA Coverage, I replicated the results and found several other Garrard County Hill matches within the match lists of other cousins who descend from my 4x great grandfather and for whom I have access to their match lists. Again, all were shared matches with other Hill descendants of my 4x great grandfather. They had membership within the Hill genetic network.

An hour later upon receipt of the email from FamilyTreeDNA indicating a new Big Y match, I logged onto to view where the new Big Y upgrade match was positioned within the Match Time Tree. The time tree shows that my Hill cousin and the descendant of the Garrard County Hills share a common male Hill ancestor who was likely born around 1757. The haplogroup for the common ancestor is represented by F-FTG62257 in the below image.

Match time tree report from FamilyTreeDNA Big Y-700 DNA test for Hill surname

While the Match Time Tree provides the mean estimated birth year for the MRCA and the 95% confidence interval range (visible in the brackets on either side of the haplogroup icon in the above image), I prefer to consult the scientific details for a more technical view, which is depicted below. To interpret the report’s data, your accumulated documentary and DNA knowledge for your ancestor is important.

Scientific details report from FamilyTreeDNA Big Y-700 DNA test for Hill ancestry

When consulting the report, I tend to work with the mean and the 68% confidence interval, which is the dark blue band immediately surrounding the mean.

I begin my Hill genetic network analysis with the estimated MRCA mean, which is 1757. My 4x great grandfather, William Hill, was born about 1775, and the progenitor of the Garrard County, Kentucky Hills, John Hill, was born about 1755. The mean MRCA of 1757 and John Hill’s birth year of approximately 1755 is tempting to conclude that the puzzle is solved. However, we need to apply our accumulated knowledge of our ancestor’s unique situation to the interpretation of the mean.

Technically, John Hill could be William Hill’s father based solely on estimated birth years and the mean MRCA. However, at this time, this seems less likely given that William Hill ended up in Lycoming County, Pennsylvania by 1802 and John Hill began and reared his family in Garrard County, Kentucky. Migrations at this time were typically west and south and rarely back east or north. It’s not likely my William Hill would have been raised in Kentucky and later move to Pennsylvania. Furthermore, all of my 4x great grandfather’s children who lived to 1880, which was the first census year where enumerated individuals listed their parent’s birth locations, indicated his birth location was Pennsylvania. None of John Hill’s children lived to 1880, but oral history for his descendants suggests either Kentucky or Virginia. Not the best fit with the Y-DNA data.

However, there is evidence of a William Hill who lived on Sugar Creek, Garrard County as early as 1798[1], which is where John Hill lived. However, this William Hill is reported to have died in 1813 in Detroit while serving in the U.S. 28th Regiment under the command of Thomas L. Butler.[2] It is more likely that this Garrard County William Hill was probably the first son of John Hill.

Therefore, it is more probable that my 4x great grandfather, William Hill (1775-1836), and the Garrard County progenitor, John Hill (1755-1839), were either brothers or cousins and the mean MRCA is more likely to be earlier than 1757. In fact, my work with my Boyd, McMasters, Parker, and Wilson Y-DNA research projects suggest that the mean will likely move further back in time if I’m able to get more male Hill descendants of either William or John Hill to test. Doing so increases the sample size and therefore the accuracy of the estimated MRCA.

At this point, this is where I rely on the 68% confidence interval to further aid my interpretation of gathered evidence. The confidence interval suggests the estimated mean birth year for the MRCA could be anywhere between 1665 and 1834. Because my 4x great grandfather, William Hill, was born about 1775, the estimated MRCA cannot be after 1775. Furthermore, because my William Hill is younger than the other match, John Hill (1755-1839), the estimated MRCA is also not likely to be after 1755. Unless John Hill (1755-1839) is the MRCA, which I do not believe is the case as discussed above, then the true estimated birth year for the MRCA is probably between 1665 and 1755. Thus, it is probable that John and William Hill are brothers or cousins, which aligns with an MRCA earlier than 1755 (John’s birth year) or 1757 (the current estimated MRCA).

In further support of a brother or cousin relationship supposition, I return to the evaluation of the unlinked family clusters within the initial Hill genetic network. I have observed that the Kentucky Hills share autosomal DNA matches with the Clark and Linn unlinked family clusters identified in my prior research (see the image below). However, the Kentucky Hills (in green) do not share membership with the other previously identified unlinked family clusters in pink. These differences may suggest a generational split where the common ancestor of these other clusters may have married into my line or they may be more closely related to my line than the Kentucky Hills.

Expanded Hill genetic network using viewed match switching: identification of the Kentucky branch of the Hills

The addition of Y-DNA has better illuminated how the genealogical puzzle pieces (unlinked family clusters) fit together. It seems that the Clarks and Linns are more closely aligned with the generation(s) where the Kentucky and Pennsylvania Hills converge. Because the Harris, Matkins, McKeel, Rumley, and Vickers clusters do not match the Kentucky Hills, they may be related to a branch of the Pennsylvania Hills that is not shared by the Kentucky Hills. Only further analysis — both DNA and documentary research — can answer that question.

It should also be noted that two other unlinked family clusters have emerged from my initial analysis of the shared matches between the descendants of my Pennsylvania Hills and the Kentucky Hills. The new unlinked clusters are visualized in the above image under the surnames of Boyd (from Mississippi) and Herring (from Michigan). I have added each cluster to my future research plans for further analysis.

Conclusion

Using Y-DNA has greatly informed my autosomal DNA findings and made speculative evidence more conclusive. I now feel more confident that the Hill genetic network I have been working with for the past year is in fact related to my Hill line. I also know that at least the Clark and Linn unlinked family clusters are more closely related to both my Pennsylvania branch and the Kentucky branch of the Hills. As I move forward, I will concentrate my search for evidence for how and where the Hills, Clarks, and Linns converge most probably about a generation or two earlier than 1757.

When incorporating Y-DNA into your autosomal DNA research, it is important to consider the amount of time it takes to both find a cousin to take a Y-DNA test and for the test results to complete. In my case, it took nine months stressing the point that autosomal and Y DNA are parallel research paths not serial paths.

For those interested in additional examples incorporating Y-DNA testing with autosomal DNA analysis, I encourage you review one of two research reports for discovering the ancestral origins of John Wilson or the father of William Wilson, or watch a YouTube learning module that visually presents the analyses from both reports.

In my next post, I correlate migration routes, inter-cluster matching, and segment triangulation for the unlinked family clusters associated with the Hill genetic network to theorize where my 4x great grandfather fits into the Dorchester County, Maryland Hills, which my unlinked family cluster research has suggested.


Subscribe to Blog Posts
**Check your Spam filter for mail from MyFamilyPattern@gmail.com.

Sources

[1] Garrard County, Kentucky, land deed, John Keys to Hezekiah Turpin (1798), Volume A, p. 143-144, Recorder of Deeds, Lancaster; database with an image (www.familysearch.org), image 78-79 of 597, film 7899064.

[2] Garrard County, Kentucky, court records 1815-1819, order books, Nancy Hill deposition (1816), Volume 15, p. 111, Lancaster County Court; database with an image (www.familysearch.org), image 89 of 637, film 7646900.

2 thoughts on “Using Y-DNA to Analyze Genetic Networks and Unlinked Family Clusters”

  1. Rick, thank you for this excellent post and thorough analysis. I absolutely agree that the combination of deep autosomal research and Y-DNA results hold great promise for the potential to “close-the-connection” between the two technologies and gain reasonable confirmations of distant genetic heritage. While Y testing often has disappointing results the combination of more testers, especially those recruited. and greater use of the newer autosomal segment data tools, really expands the potential here. While the effort can be extensive, tracing autosomal segments back many many generations is increasing possible. With chromosome painting, target autosomal testing and strategies like the “ask-the-wife” approach, even more connections become available.

    Reply

Leave a Comment