This month I am passing the baton! Last time I wrote about 10 Tips to Trial a Tool, and I decided that despite the temptation, I didn’t have time yet to explore the new clustering tools. Fortunately, Andrea Ackermann, one of my fellow team leaders at our Central Indiana DNA Interest Group, has taken the plunge. So I’ve invited her to share her thoughts on clustering tools here as a guest blogger. Welcome, Andrea!
Clustering Tools for DNA Matches – a Beginner’s Guide by Andrea Ackermann
Clustering tools for DNA matches are all the buzz these days. And to be clear, I mean autosomal DNA matches – those from AncestryDNA, 23andMe, MyHeritage, and FamilyFinder from FamilyTree DNA. Autosomal matches are very useful for validating family lines AND for solving genealogical mysteries. I use shared matches to create my own genetic networks to tackle brick walls. They can take my research so far, but they don’t always answer my questions. The new clustering tools tweaked my interest—they may be able to find groupings that I haven’t identified by other methods.
What are clustering tools?
These clustering tools are an automated way to group your matches. The matches are listed both down and across the graph to show who matches whom. The clusters are presented as colored groupings on the graph with corresponding data tables that provide more detail. The diagonal line is where each match matches themselves. See Figure 1.
The idea is that each cluster potentially shares a common ancestor. The descendants of Martin and Minnie are in the red box and those of Jotham and Mary are in the blue box. Some clusters contain descendants of a couple, and some in the cluster may be related to just one of the couple. Any tool that automates shared matching is very useful. Think of it as a clue detector. The clusters don’t prove relationships, but they provide insights.
I have explored both Genetic Affairs (See Figure 1) and Collins’ Leeds Method 3D from DNAgedcom.com (See Figure 2).
Last month, Ann blogged about her “10 tips to trial a tool”. Here is how I would apply those tips to these tools.
1. Do I have what I need to use it?
I have results from an autosomal DNA test with at least one of four major testing companies. I ran these tools on a Windows PC; check tool documentation if you’re using a Mac.
2. What does it cost?
- Genetic Affairs by Evert-Jan Blom will give you 200 credits to try the program. Then, you can purchase additional credits. It cost 25 credits (25 cents in USD) per Auto Cluster run.
- The DNAgedcom Client offers the Collins’ Leeds Method 3D. The DNAgedcom Client costs $5/month. During the month, you can run as many reports as you want.
The cost is low if used on an occasional basis.
3. How easy is it to learn and use?
Genetic Affairs requires you to set up an account. Easy enough. You will be required to enter your login and password for each company. Overall, fairly easy and straight forward to use. Some good blog posts to explore getting started with this tool are by The Intrepid Sleuth and by Roberta Estes, at DNA-explained.
Collins’ Leeds Method 3D requires a DNAgedcom Client account. It also requires that Matches and ICW files from Client to be run before the clustering can be performed. The reports may take some time to run. The program runs on your own computer. Kitty Cooper has a helpful blog post about this tool here.
The programs will ask for match parameters in centimorgans to determine which matches you want to include. It took me a few tries to find the parameters that delivered reasonable results. I usually start with 250 cMs down to 20 cMs–approximately second to fourth cousins. Your specific situation and needs may require different parameters; you may want to experiment.
Overall, these programs require the ability to set up accounts and file management. The ability to read graphs and tables is helpful.
4. How much time does it take?
The amount of time it takes will vary by program. Genetic Affairs will probably take the least amount of time – maybe 15-30 minutes total. Running reports on DNAgedcom Client can take an hour or two or more to run. Spreadsheets also can take some time depending on the user’s knowledge. The interpretation of the results and time to absorb everything will vary. The idea is to correlate the clusters with your family tree or at least to a grandparent line.
**TIP** Limit your interest to one family line or group of unknown matches. Focus your research efforts. Look for known cousins to identify clusters. Come back to additional family lines.
5. What resources are there to help me?
6. Does it work reliably, or are there frequent hiccups?
Some hiccups have been worked out. An occasional problem for Genetic Affairs and DNAgedcom Client occurs when the DNA companies restrict developers’ access.
Another word of caution – Genetic Affairs will default to weekly reports on new matches after AutoCluster has been performed. (This is a subscription that costs credits that you purchase. You can change the schedule to NEVER if you don’t want to subscribe, or you can set parameters of your choosing on which companies/kits to check for new matches, what size threshold merits an alert, and how often you want to be notified. Check out the pricing.) The two blog posts by Louis Kessler mentioned above compared the various genetic clustering programs. Louis’s conclusion was there were differences in the results between programs. My results also differed. These programs are new so expect future improvements.
7. Any privacy concerns? (Do I need to give it my passwords? Does it store data on my drive or on the web? Do I care?)
Genetic Affairs requires the input of your login and password for each database. I simply delete each website after AutoCluster has run. DNAgedcom client runs on your computer.
8. Can I accomplish the same results another way?
Yes, you can accomplish clustering by hand. Two years ago, Blaine Bettinger wrote about Clustering Shared Matches. Dana Leeds later developed the methodology for manual DNA Color Clustering, which works with matches from any company. These methods can be time consuming and may yield slightly different results.
There are other tools that automate match clustering, not reviewed here. For example:
9. How much fun is it?
These programs are fun. Genetic Affairs and Collins’ Leeds Method 3D show more information when the pointer hovers over a data point. The gray boxes yield more information. I like the visualization of my match clusters. The Collins’ Leeds Method 3D on DNAgedcom has a hot link to each matches’ family tree! Great feature.
10. And perhaps most important, how will it add value to my genealogy?
Auto clustering tools will add value to my genealogy. The cluster may be the key for a brick wall genealogical problem. Today, I’m applying it to explore my previously unknown Norwegian grandmother cluster. See Figure 3.
Each program may give different results—in other words, a different clue. Traditional paper research is still needed. At the end of the day, clustering tools are just automated ways to mine your matches for new leads. DNA and traditional documentary evidence need to dovetail each other.
Andrea Ackermann (c) 2019