Clustering Tools for DNA matches

This month I am passing the baton! Last time I wrote about 10 Tips to Trial a Tool, and I decided that despite the temptation, I didn’t have time yet to explore the new clustering tools. Fortunately, Andrea Ackermann, one of my fellow team leaders at our Central Indiana DNA Interest Group, has taken the plunge. So I’ve invited her to share her thoughts on clustering tools here as a guest blogger. Welcome, Andrea!

 

Clustering Tools for DNA Matches – a Beginner’s Guide by Andrea Ackermann

Clustering tools for DNA matches are all the buzz these days. And to be clear, I mean autosomal DNA matches – those from AncestryDNA, 23andMe, MyHeritage, and FamilyFinder from FamilyTree DNA. Autosomal matches are very useful for validating family lines AND for solving genealogical mysteries. I use shared matches to create my own genetic networks to tackle brick walls. They can take my research so far, but they don’t always answer my questions. The new clustering tools tweaked my interest—they may be able to find groupings that I haven’t identified by other methods.

What are clustering tools?

These clustering tools are an automated way to group your matches. The matches are listed both down and across the graph to show who matches whom. The clusters are presented as colored groupings on the graph with corresponding data tables that provide more detail. The diagonal line is where each match matches themselves. See Figure 1.

The idea is that each cluster potentially shares a common ancestor. The descendants of Martin and Minnie are in the red box and those of Jotham and Mary are in the blue box. Some clusters contain descendants of a couple, and some in the cluster may be related to just one of the couple. Any tool that automates shared matching is very useful. Think of it as a clue detector. The clusters don’t prove relationships, but they provide insights.

I have explored both Genetic Affairs (See Figure 1) and Collins’ Leeds Method 3D from DNAgedcom.com (See Figure 2).

geneticaffairspic1

Figure 1. excerpt of Genetic Affairs output

 

 

 

dnagedcompic2

Figure 2. Excerpt of Collins’ Leeds Method 3D output from DNAgedcom.com

Last month, Ann blogged about her “10 tips to trial a tool”. Here is how I would apply those tips to these tools.

1. Do I have what I need to use it?

I have results from an autosomal DNA test with at least one of four major testing companies.  I ran these tools on a Windows PC; check tool documentation if you’re using a Mac.

 

2. What does it cost?

  • Genetic Affairs  by Evert-Jan Blom will give you 200 credits to try the program. Then, you can purchase additional credits. It cost 25 credits (25 cents in USD) per Auto Cluster run.
  • The DNAgedcom Client  offers the Collins’ Leeds Method 3D. The DNAgedcom Client costs $5/month. During the month, you can run as many reports as you want.

The cost is low if used on an occasional basis.

 

3. How easy is it to learn and use?

Genetic Affairs requires you to set up an account. Easy enough. You will be required to enter your login and password for each company.  Overall, fairly easy and straight forward to use. Some good blog posts to explore getting started with this tool are by The Intrepid Sleuth and by Roberta Estes, at DNA-explained.

Collins’ Leeds Method 3D requires a DNAgedcom Client account. It also requires that Matches and ICW files from Client to be run before the clustering can be performed. The reports may take some time to run. The program runs on your own computer. Kitty Cooper has a helpful blog post about this tool here.

The programs will ask for match parameters in centimorgans to determine which matches you want to include. It took me a few tries to find the parameters that delivered reasonable results. I usually start with 250 cMs down to 20 cMs–approximately second to fourth cousins. Your specific situation and needs may require different parameters; you may want to experiment.

Overall, these programs require the ability to set up accounts and file management. The ability to read graphs and tables is helpful.

 

4. How much time does it take?

The amount of time it takes will vary by program. Genetic Affairs will probably take the least amount of time – maybe 15-30 minutes total. Running reports on DNAgedcom Client can take an hour or two or more to run. Spreadsheets also can take some time depending on the user’s knowledge. The interpretation of the results and time to absorb everything will vary. The idea is to correlate the clusters with your family tree or at least to a grandparent line.

**TIP** Limit your interest to one family line or group of unknown matches. Focus your research efforts. Look for known cousins to identify clusters. Come back to additional family lines.

 

5.  What resources are there to help me?

Louis Kessler wrote two detailed blog posts about genetic clustering: Genetic Clusters and DNAgedcom and Comparing Genetic Clusters. Developers also provide documentation.

For more interactive help, consider Facebook groups such as Genetic Affairs and DNAgedcom User Group.

 

6. Does it work reliably, or are there frequent hiccups?

Some hiccups have been worked out. An occasional problem for Genetic Affairs and DNAgedcom Client occurs when the DNA companies restrict developers’ access.

Another word of caution – Genetic Affairs will default to weekly reports on new matches after AutoCluster has been performed. (This is a subscription that costs credits that you purchase. You can change the schedule to NEVER if you don’t want to subscribe, or you can set parameters of your choosing on which companies/kits to check for new matches, what size threshold merits an alert, and how often you want to be notified. Check out the pricing.)  The two blog posts by Louis Kessler mentioned above compared the various genetic clustering programs. Louis’s conclusion was there were differences in the results between programs. My results also differed. These programs are new so expect future improvements.

 

7. Any privacy concerns? (Do I need to give it my passwords? Does it store data on my drive or on the web? Do I care?)

Genetic Affairs requires the input of your login and password for each database. I simply delete each website after AutoCluster has run. DNAgedcom client runs on your computer.

 

8. Can I accomplish the same results another way?

Yes, you can accomplish clustering by hand. Two years ago, Blaine Bettinger wrote about Clustering Shared Matches. Dana Leeds later developed the methodology for manual DNA Color Clustering, which works with matches from any company. These methods can be time consuming and may yield slightly different results.

There are other tools that automate match clustering, not reviewed here. For example:

 

9. How much fun is it? 

These programs are fun. Genetic Affairs and Collins’ Leeds Method 3D show more information when the pointer hovers over a data point. The gray boxes yield more information. I like the visualization of my match clusters. The Collins’ Leeds Method 3D on DNAgedcom has a hot link to each matches’ family tree! Great feature.

 

10. And perhaps most important, how will it add value to my genealogy? 

Auto clustering tools will add value to my genealogy. The cluster may be the key for a brick wall genealogical problem. Today, I’m applying it to explore my previously unknown Norwegian grandmother cluster. See Figure 3.

unknownnorwegiangrandma

Figure 3

 

Each program may give different results—in other words, a different clue. Traditional paper research is still needed. At the end of the day, clustering tools are just automated ways to mine your matches for new leads. DNA and traditional documentary evidence need to dovetail each other.

Andrea Ackermann (c) 2019

8 thoughts on “Clustering Tools for DNA matches

  1. EJ Blom

    Very nice comparison. I might add that I’ve removed the default weekly update setting for new AutoCluster analyses. Next, for additional privacy you can setup a dummy Ancestry account with limited rights. Moreover, AutoCluster works for 23andme, FTDNA and Ancestry. MyHeritage is still on todo list. Last, due to time limitations the AutoCluster runs for a certain amount of time which might prevent the download of some 4th/distant cousins. In that case (especially when you have endogamy), the DNAGedcom approach is better suited since it will download much longer and can reach the more distant clusters. Currently the tree links are not functioning in AutoCluster but I am working on that. Last, for 23andme/FTDNA runs we also perform a surname enrichment analysis to identify surnames that might be indicative of common ancestors in your autoclusters. Please feel free to visit out Facebook group!

    Liked by 3 people

    Reply
  2. Dana Leeds

    I like this 10 point guide for reviewing a new tool. I agree that working with one cluster or group of related clusters at a time is a great way to use these tools. And, as you mentioned, not only do the different tools give slightly different results, but you will also get different clusters based on the different parameters you use.

    Liked by 1 person

    Reply
  3. Scott

    My family tree is pretty well defined for the last 400 years but I have some 5th and 6th great grandmothers that have no madien names. Would this tool be useful in tracking them down?

    Like

    Reply
    1. dnasleuth Post author

      Hi Scott! These tools are quite new, so I haven’t heard any success stories like your situation- at least not yet. To be honest, autosomal DNA is most useful for ancestors 5-7 generations back. It can help confirm the biological ancestors you have a paper trail for in that span, but it sounds like your gaps are right on the cusp of being too far back. Still, you can try clustering for free – so you might give it a try! ~ Ann

      Like

      Reply
  4. Paul Baltzer

    Great review. It was interesting to learn from the Comments that Genetic Affairs (AutoCluster) might time out and not include some 4th cousins and more distant cousins compared to Collins’ Leeds Method 3D.

    Liked by 1 person

    Reply
  5. Pingback: DNA clustering links – Mainely Genealogy

Leave a comment