Custom Clusters: An Evaluation and Application

Last month, Ancestry launched its long-awaited Custom Clusters feature as part of its Pro Tools product offering. In this post, I put it to the test to see if it’s worth the cost and assess how it can advance your genealogy research.

With Custom Clusters, Ancestry Pro Tool subscribers can create clusters of DNA matches aligning with an ancestral line of interest or around an unknown match. Custom Clusters is one of several tools included in the $10 per month additional charge. I’ve previously written that Pro Tools are worth the cost if for no other reason because of the Enhanced Shared Matching feature.

Not to bury the lead, but the addition of Custom Clusters feature is the reason you need to subscribe to Pro Tools if its other features weren’t convincing enough. To be clear, my recommendation is not influenced by Ancestry. I pay for my own subscription, and I receive nothing in return from Ancestry for writing this post.

To explain why Custom Clusters provide additional value and how it can be used to complement your existing research efforts, I draw on my own genealogical research questions and present a use case for the tool. Custom Clusters is not just for advanced DNA researchers but for intermediate as well. It can help beginner users, but they are better off mastering the basics of DNA analysis before using Custom Clusters as it might initially present as overwhelming.

How to Use Custom Clusters

There’s not much detailed documentation yet for Custom Clusters but Angie Bush, Research Manager at Ancestry’s ProGenealogists, provides a useful introduction and basic instructions with screenshots. To begin, Custom Clusters are constructed around a match of interest (MOI). In my opinion, an MOI is best found using one of two methods – strategically or diagnostically.

A strategic MOI is identified by selecting a DNA match who descends from your ancestor of interest (AOI) or ancestral couple of interest (ACOI) through a different child line than the one you descend, which I call a lateral cousin (see image below). Ideally, your MOI should not share any other ancestor except the AOI or ACOI.

Strategic use of the match of interest (MOI) using lateral cousins, descendant cousins, and ancestral cousins

The AOI could be a great grandfather whose parents are unknown but for whom you are attempting to discover. Because the DNA tester and the MOI share only the AOI or ACOI, most if not all shared DNA matches will either descend from the ACOI through the DNA tester’s direct ancestral line (descendant cousin) or through another child line of the ACOI (lateral cousins). Descendant and lateral cousins are signals that you have identified the correct genetic network centered around the AOI or ACOI.

What you also hope to find are ancestral cousins, who descend through different child lines of your AOI’s grandparents than the child line that the DNA tester descends (see image above). Determining how ancestral cousin match you is how you discover your AOI’s parents and grandparents. I’ve written extensively how to strategically select your AOI through a process I call EGGOS, or Earliest Generation Group of Siblings search strategy (see EGGOS YouTube learning module).

A diagnostic MOI is a DNA match encountered during the strategic analysis of a cluster of DNA matches but for whom you cannot yet determine the match’s exact connection to you. This could be a match who has no family tree, an incomplete tree, or even a detailed tree. Creating a separate Custom Cluster for this mystery match could help further isolate the ancestral line(s) to which it belongs.

To set up your Custom Cluster, I recommend using Angie Bush’s easy-to-follow instructions. Once you create your first cluster, this post can help you with interpretation. Now let’s look at a case study to see how you can use Cluster Matches.

Case Study: Identifying the Parents of William Hill (1775-1836)

If you follow my blog regularly, you’re likely familiar with my Hill research where I discovered a cluster of DNA matches taking my Hill ancestral line back from Pennsylvania in the late 1700s to Dorchester County, Maryland in the late 1600s. My Hill research started before Ancestry introduced Custom Clusters, so it offers a great comparison for what you can do with and without the Ancestry Pro Tools’ Custom Clusters feature.

In my earlier research, I had previously identified an MOI using my EGGOS search strategy. This MOI descends from another child line of my 4x great grandfather, William Hill (1775-1836), whose parents were unknown (see image below). I used my cousin’s DNA match list and found a cluster of more than 100 DNA matches identifying several unlinked family clusters[1] helping me to identify Dorchester County as the place to where my Hill ancestors immigrated from Europe. The unlinked family clusters were for Clark, Linn, and McKeel families.

Strategic selection of DNA matches for Ancestry's Custom Cluster

For the new Custom Clusters tool, I used the same MOI I used previously, which provides for an easy comparison to what the Shared Match filter and the Enhanced Shared Matches tool can do. I added four sidekick matches to the Custom Cluster who descend through the same child line of William Hill that the MOI descends. I limited the matches in the Custom Cluster to those having between 20 cM and 100 cM (20 cM is Ancestry’s lower threshold limit).

The resulting Custom Clusters are shown below. Four clusters were generated from the MOI and the four included sidekick matches. The smallest clusters of 6 and 8 matches all descend through the same child line of the AOI as the MOI and are not very useful since I already know how the MOI is related to me. The 9-match cluster is also mostly comprised of descendants from the same child line of the AOI as the MOI except for one match, which is also represented in the largest cluster of 41 matches.

Ancestry's Custom Cluster overview

The larger cluster of 41 matches is the most useful for my research question as it contains many matches descending through the common ancestor between myself and the MOI – William Hill (1775-1836). It also contains many other matches for whom I do not know how they are related to me, which is what we want. Determining how they connect to William Hill is how we advance our research.

The custom cluster contains 41 matches (anonymized below) compared to the previously identified cluster created using the Shared Matches filter, which had over 100 matches. The smaller Custom Cluster list is not surprising given that I restricted the range to 20 cM to 100 cM. None of the matches in the custom cluster were new to me. I had previously investigated all of them, so the Custom Cluster offered nothing new for me by way of included matches. I can conclude that my previously used EGGOS search strategy and the Shared Match filter worked very well.

Ancestry custom cluster grid for Hill genetic network

The Custom Cluster excludes matches below 20 cM (except for sidekick matches), which is how Ancestry designed the tool. However, using Enhanced Shared Matches, which is another Pro Tool, I was previously able to see all shared matches below 20 cM. While a few of the shared matches below 20 cM were false matches (or misclassified matches), most were not. In fact, many of the smaller matches had trees, which permitted me to determine the ancestral connections to many of the larger (+20 cM) matches that had no trees. I used Enhanced Shared Matches in this latter case.

The small-match limitation highlights that the Custom Cluster tool is just that – a tool. You should also replicate the Custom Cluster within your list of matches. That is, select the MOI in your match list and click on the Shared Matches filter. Use all the matches here to help you identify how each match in the genetic network relates to you or the MOI. As I’ve written before, small matches (less than 20 cM) can be informative if you use them appropriately (see also its companion YouTube learning module).

Where I think Custom Clusters added considerable value is with the actual match grid – it helps you quickly see the pattern within your DNA matches. While I was able to “see” the pattern within my original Shared Match list analysis, the Custom Cluster grid can help those of us who are visual learners and need stronger visual cues. I previously recognized that my Hill line matched the Clarks, Linns, and McKeels and that each subcluster also matched one another. However, the grid helps you see the pattern more clearly – and differently.

In the above image, I’ve labeled the subclusters in the grid. The grid contains four smaller clusters. The one closest to the upper left-hand corner appears to share more DNA with my cousin and the MOI as it contains the most known matches descending from William Hill (1775-1836) through not only the two child lines represented by my cousin and the MOI but other child lines, too.

This larger cluster is represented by descendants of the Hills, inclusive of my line and the MOI’s line. It also includes descendants of the McKeel and Clark unlinked family clusters, whose progenitors were Joseph McKeel (1776-1864) and an unknown Clark through several of his children. All three clusters (Hills, Clarks, and McKeels) came from Pennsylvania into Ohio between 1810 and 1820.

Moving toward the lower right-hand corner is a smaller cluster of three matches (MI, CM, and CW). Using the Enhanced Shared Matching tool, I can see that they are all very closely related to one another. I don’t know how they are connected to me or the MOI, but they do have a Clark ancestor, who is unfamiliar to me. Their Clark ancestor came from Pennsylvania into Ohio prior to 1840 – a potential clue to be further investigated.

The third smaller cluster (CI, RP) are two closely related matches descending from the same ancestral line as the MOI. They are two of the four sidekick matches I originally added to the Custom Cluster tool.

The fourth and last cluster of 9 matches (CR through JF) all comprise descendants from the Linn unlinked family cluster, whose progenitors are Asa Linn (1777-1868) and Elizabeth Hawkins, who moved from somewhere in North Carolina to Tennessee around 1810. There are a few members of the Linn cluster (RG, JJ, AH, EW) who also match the Hill/Clark/McKeel cluster (cluster 1) thereby linking them to the other subclusters within the larger Custom Cluster grid.

What’s Missing from the Custom Clusters

Several matches who were part of the original genetic network created using the Shared Match filter were not found in the Custom Cluster. These were important matches because they represent the other unlinked family clusters tying my Hill line back to Dorchester County, Maryland, namely, Matkins and Rumley.

It’s unclear why these matches didn’t appear in the Custom Cluster as several of them were greater than 20 cM, which is the minimum cM threshold to be included in Ancestry’s Custom Clusters. While the Matkins and Rumley matches also matched some of the Hill, Clark, Linn, and McKeel matches, I believe the reason they didn’t appear in the Cluster may be because of weak ties to the MOI and sidekick matches. While still part of the same genetic network, they have fewer matches to other Hill, Clark, Linn, and McKeel matches.

Although the absence of Matkins and Rumley matches in the Custom Cluster were likely because of weak ties, several other unlinked family clusters were also absent but for seemingly other reasons. Because Custom Clusters only include matches with 20 or more cM, the Harris and Vickers subclusters were missing because they all fall below the 20 cM threshold but many are above 15 cM. Despite these matches sharing fewer than 20 cM, I have proven them to be valid. These matches are only visible when using the Enhanced Shared Matches tool using the Shared Match filter.

To gain additional perspective about the missing Matkins and Rumley matches, and potentially other differences between Custom Clusters and other tools, I compare Custom Clusters to Gephi network Graphs.

Comparing Custom Clusters to Gephi Network Graphs

Before the introduction of Custom Clusters, I created a Gephi network graph for my cousin’s maternal matches, which is where our shared Hill line resides. I had identified three clusters where the Clark, Linn, McKeel, and other unlinked family cluster matches were found, which are labeled below as A, B, and C in the image below. Interestingly, the 41 matches included in the Custom Cluster are found only in clusters B and C.

Gephi network graph for the Hill genetic network to compare against Ancestry's custom clusters

The first sub cluster in the upper left-hand corner of the Custom Cluster, which included Hill, Clark, and McKeel matches, are all found within cluster B within the Gephi network graph. The second smaller sub cluster in the Custom Cluster is also found in cluster B of the Gephi network graph. My previous work suggested cluster B shares a common DNA segment on chromosome 10.

The third sub cluster in the Custom Cluster is not found in the Gephi network graph because I excluded matches below 15 cM when creating the Gephi graph to decrease clutter in its presentation and to ensure matches were valid. The two matches in the third sub cluster both have fewer than 20 cM but appear in the Custom Cluster because they are two of my sidekick matches.

The fourth and final sub cluster in the Custom Cluster, which included only Linn matches, are found within cluster C in the Gephi network graph. My previous work suggested cluster C shares a common DNA segment on chromosome 2.

At least in this example, it seems that the Custom Cluster captured the same genetic distance displayed in the Gephi network graph. That is, clusters B and C in network graph appear to share a common ancestor, but that cluster C is perhaps more distantly related as it is spatially more distant from my cousin’s more recent Hill cluster, which is red in the above image. Indeed, the Custom Cluster also placed the Linn cluster (4) more distanced from the main Hill/Clark/McKeel cluster (1) suggesting a more distant connection.

What is curious is that the Custom Cluster did not include any matches in cluster A of the Gephi network graph. Matches in this cluster also include descendants of Clarks and McKeels, and visually you can see how interconnected the cluster is with clusters B and C, which comprised the Custom Cluster. While my previous work suggested cluster A shares a common DNA segment on chromosome 5, this alone cannot explain why they didn’t appear in the same Custom Cluster as clusters B and C each suggest different chromosomes for their respected shared segments.

Interpreting Custom Clusters with Documentary Research

The beauty of the Custom Cluster grid is that it shows me something that I could not see as clearly within the manually created Shared Match filter clusters. The Hill/Clark/McKeel cluster is more closely related to my cousin suggesting they share a more recent common ancestor with him than the Linn cluster. There are only a few Hill matches that bridge the two clusters, and these Hill matches tend to have larger shared cM segments with my cousin.

The 1700 Pennsylvania origins of the Hills, Clarks, and McKeels support the above hypothesis. It is probable that this group migrated out of Maryland together as a family unit dispersing within Pennsylvania after some time. By 1800, each family lived in distinctly different parts of Pennsylvania and subsequently moved to different parts of Ohio (see map below).

Hill genetic network migration from dorchester county maryland into pennsylvania in the 1700s

If the families traveled together, they likely parted ways in either Chester, Lancaster, or York Counties, which are located along the eastern portion of the Forbes Road and were often the first migration stops into early Pennsylvania before heading further west. The most probable route out of Maryland to Pennsylvania would have been by boat from Dorchester to the north end of the Chesapeake Bay and then up the Susquehanna River.[2] This would make Lancaster or York Counties their likely initial early residences. Other alternative migration routes are depicted on the above map, but they still lead to the Forbes Road.

The Forbes Road, or its earlier names of Raystown Path or Old Trader Path, was a principal route across Pennsylvania at that time.[3] From here, it would seem the families went their separate ways either continuing west across the Forbes Road (McKeel to Cumberland County and Clarks to Washington County) or north up the Susquehanna River (Hill to Lycoming County).

My documentary research has found that the McKeels and Hills shared a fence line in Dorchester County, Maryland for more than 100 years from the late 1600s to the mid 1700s.[4] The Clarks were from neighboring Delaware.[5] However, one Dorchester Hill descendant moved to Delaware in the 1730s suggesting a possible earlier migration out of Maryland to Delaware before perhaps moving on to Pennsylvania.[6]

Given the North Carolina origins of the Linns, it is probable that a Hill descendant moved from Dorchester to North Carolina in the 1700s. Indeed, other Dorchester County Hill descendants (Matkins) moved to Caswell County, North Carolina in the late 1700s, and other affiliated families from Dorchester (Rumley) moved to other parts of North Carolina. However, no documentary research provides additional insight where the Hill descendants and Linns might have met because the North Carolina origins of the Linns are not known.

Putting It All Together

Like all DNA analysis tools before it, Custom Clusters are not a panacea for our genealogical problems. However, it does offer a new and strategic methodology that is more visual than other tools and takes advantage of one of the largest DNA databases with excellent family tree visualization. Custom Clusters can assist in formulating proof arguments, and herewithin is mine.

Recalling that William Hill (1775-1836) is my 4x great grandfather, the working hypothesis now is that the Pennsylvania Hills, Clarks, and McKeels converge at my 5x or 6x great grandparent level perhaps in Pennsylvania or Delaware. The Linns likely converge at the 6x or 7x great grandparent level probably back in Dorchester. Exactly how remains elusive, but I am getting closer. The Custom Cluster tool gives me a new lens in which to view my collected DNA and documentary research.

In fact, moving forward, I intend to use Custom Clusters akin to increasing the DNA coverage of William Hill (1775-1836) by running similar clusters for other cousins who have graciously shared their DNA match lists with me. I want to compare how differently Custom Clusters appear for those descending from the progenitors of the Clark, Harris, Linn, Matkins, McKeel, Rumley, and Vickers unlinked family clusters compared to those cousins descending from William Hill. The hope is that I can triangulate the results to walk back each unlinked family cluster back to their Dorchester County, Maryland origins.


Subscribe to Blog Posts
**Check your Spam filter for mail from MyFamilyPattern@gmail.com.

Sources

[1] An unlinked family cluster is a large group of matches who all descend from a single ancestor but for whom you are unable to establish a genetic relationship. Sometimes the common ancestor of the unlinked family cluster shares the same surname as the ancestor you’re researching. See for example, https://myfamilypattern.com/geneticnetworks5/.
[2] Maryland State Archives (n.d.), Colonial and Early National Transportation, 1700-1800. Retrieved 5 November 2025 from https://www.roads.maryland.gov/OPPEN/II-Colon.pdf.
[3] FamilySearch Wiki (n.d.), Forbes Road. Retrieved 5 November 2025 from https://www.familysearch.org/en/wiki/Forbes_Road.
[4] Mowbray, Calvin W. and Mary I. Mowbray (1992), Early Settlers of Dorchester County and Their Lands, Volume 1. Westminster, MD: Heritage Books.
[5] See for example, Inter-state Publishing Company (1886). Biographical and Historical Record of Wayne and Appanoose Counties, Iowa. Chicago, IL: Inter-state Publishing Company, p. 469.
[6] Dorchester County, Maryland, Recorder of Deeds, John Hill to Thomas Mackeel (1741), land deed, Old Book 10, p. 349.

9 thoughts on “Custom Clusters: An Evaluation and Application”

  1. I scanned your blog post just to see if the clustering feature included matches with less than 20 cM. My biggest beef with Ancestry is that 20cM lower limit. I really wish they would lower it down to at least 15cM on this feature and the “Shared Matches” feature. I would pay for this if they did.

    Reply
    • I also wish for this. I believe the best action to take is to use Custom Clusters in tandem with the Shared Match filter. The latter lets you see the full picture, but Custom Clusters lets you see a portion of the picture really well.

      Reply
  2. Thank you for your excellent post! To be clear, I’m not a blogger per se, rather I am a Sr. Research Manager with Ancestry’s research arm, ProGenealogists, with degrees in molecular biology and biotechnology. I’ve been working with others at Ancestry to bring some additional tools for advanced users to the site and am glad to see that this one is working for you.

    To understand why some of the matches aren’t appearing as you’d expect, you might study the three types of clustering Ancestry employs with its ensemble strategy: Louvain, Agglomerative Hierarchal, and Spectral. Each has strengths and weaknesses, and you might be able to see what you’re expecting based on shared matches if you play around with some of the parameters a bit.

    Reply
    • Hi Angie – question for you about Ancestry’s clustering mechanisms …
      Does it take into account shared cM? I was wondering because that would be helpful in clustering closer relatives but could confuse the issue in more distant relationships.
      I’m sure that was taken into account already but I was just wondering how it’s handled?
      Thanks – Kit

      Reply
  3. Thanks for a very interesting discussion.

    One thing I have not seen much written about is the selection of the 4 sidekick matches.
    Are there more or less successful strategies as to how to pick those? It seems that the later choices change depending on the initial choices? How does that matter?

    Reply
    • My understanding of the sidekick matches is that helps to refine the custom cluster created from the selected match of interest (MOI) to ensure you receive a cluster for the potential shared ancestor between the DNA tester and the MOI. Because Ancestry only presents matches with 20 or more cM, adding sidekick matches on the same ancestral line as the MOI improves the number of qualified matches included in the cluster. The sidekick matches might match others in the larger genetic network that the MOI does not thereby increasing the number of relevant matches that you can investigate.

      Reply
    • I don’t often use SideKick matches. They only need to be used if the amount of DNA that they share falls outside of the cM range that you select in the next step.

      Reply
  4. Very interesting blog.
    My observation about the Ancestry clustering is that the way the rows are ordered makes the clusters look more distinct than they would be otherwise. For example, if TC in the above cluster is moved to the top row, then the first two clusters appear to be more similar.
    My Ancestry clusters are similar – I’m using Python code to help me move the rows around and play with the clusters to better see where the differences are.

    Reply

Leave a Comment