Shared cM Tool revisited

I think the MVP of my genetic genealogy toolkit must be the Shared cM Tool on Jonny Perl’s DNAPainter website here:  https://dnapainter.com/tools/sharedcmv4 .

There are different sets of data you can use here when evaluating a DNA match. Do you know the difference between them and which is better for your purpose? Let’s explore!

At the top of the webpage, the left side contains static links—see the section marked ‘A’ in image 1—where you can find more information about the project. (For space considerations, some of this section may not appear on a mobile view.) To the right of this—see the section marked ‘B’ in image 1—there is a dynamic section where you can input the amount of cM two test takers share.

Image 1.

Now it’s time to ask yourself: Do you have

1. very little idea how these two test takers are related? Or

2. a hypothesis who the shared ancestor is and want to know if the size of the match supports it?

Getting started

Whichever the case, you can input a value in the Filter box, like 156 cM, and the tool will offer you the odds for various relationship possibilities for that amount of shared DNA. See image 2. In fact, it puts these relationship possibilities into one or more ‘buckets’—every relationship in a given bucket is just as likely as the other relationships in that same bucket. For example, the odds that two test takers who share 156 cM are one-half second cousins are 53%, the same odds as second cousins once removed. For more on why some relationships share the same odds, check out this 2017 blog post by the DNA Geek, Leah Larkin: https://thednageek.com/meiosis-schmeiosis/.

Image 2.

This probability chart, which ties into the What Are the Odds tool at DNAPainter as well, is great for helping us create hypotheses for unknown common ancestors.

You may try to ascertain the ages of the test takers in a match pair and speculate whether they are the same generation or one or more generations apart. (The smaller the match, the harder it may be to make an accurate guess about this though.) Then you use the highest probability bucket to predict how far back the common ancestor likely is. If you like, you can even try out the link labeled ‘New: View these relationships in a tree’.

From the resulting probabilities, you can develop a research plan to seek out documentary evidence for the most likely ancestral candidates, and you may even want to recruit more test takers in certain lines (prioritized in part by these probability results) to support or refute your evolving hypothesis. If your research and/or your recruited test takers produce evidence that your first hypothesis didn’t pan out, you can explore the relationship that scored the next highest probability.

Suppose you’ve concluded that the test takers are likely the same generation. In this example, you’ll see that the odds they have a relationship shown in the first bucket (such has half-second cousins) are more than twice as likely as relationships in the second bucket (such as full second cousins) and more than three times as likely as the next bucket (e.g., full third cousins).

  • 53% – half second cousins
  • 26% – full second cousins
  • 15% – full third cousins

This doesn’t mean they aren’t third cousins. But when you have no other idea how they are related, the probabilities help you plan your next steps for additional documentary research or DNA testing; in this case, considering half-second cousins first.

Where do these probability values come from? They are courtesy of Leah Larkin, who blogged about her number-crunching here: https://thednageek.com/the-limits-of-predicting-relationships-using-dna/ . It’s based on a 2016 white paper published by Ancestry (see the link in Leah’s blog post) and uses numbers based on simulated data. Think Sheldon, on The Big Bang Theory, and his theoretical calculations.

Image 3. Sheldon, theoretical science, The Big Bang Theory. Source: https://static.wikia.nocookie.net/bigbangtheory/images/e/eb/Bot2.jpg/revision/latest?cb=20121014122624

Side note: On the upper left of the Shared cM Tool webpage—unless you’re on your cell phone—there is a link for “Beta with updated probabilities”. The non-beta (i.e., default) view shows the probabilities that AncestryDNA displayed for a pair of matches at the time of their 2016 white paper. AncestryDNA has since revised their calculations, but they have not published a new white paper to explain their new reasoning. The beta probabilities reflect a more recent version of what Ancestry displays for a match—and Ancestry may make more unannounced changes to their calculations at any time. 

With no explanation from Ancestry about the science or math behind the change, DNAPainter keeps the white paper-supported version as the default, and makes the other available as a beta option.

Here’s one example of the different results posted for two test takers sharing 156 cM, where second cousins are shown as 2c and half second cousins as half 2c.

  • 2016 (default version): 53% probability for half 2c; 26% for 2c
  • 2020 (beta version): 51% probability for 2c; 40% for half 2c.

Which should you use? I use the default version. But I use these probabilities primarily when I don’t know how two test takers are related, to help direct my research efforts. If relationships in the top bucket don’t pan out, I move to the next bucket. The fact that the two versions  can have differences isn’t a big problem. They aren’t used to prove a hypothesis, but to help formulate more efficient research and testing plans.

What if you have a hypothesis: does the data support it?

You can certainly use the same percentages to begin assessing your hypothesis. Let’s look at a new example: say you have found a new DNA cousin sharing 39 cM with you, and your trees suggest a specific common ancestral couple that would make you and your match fourth cousins once removed (4c1r). The probability tool below in Image 4 suggests that 4c1r relationship falls in the top (most likely) bucket, so the size of the DNA match supports (but does not prove) your conclusion. You might just as easily be 8th cousins!

Image 4.

What if, however, your trees suggest a much more recent common ancestor, one that would make you second cousins once removed (2c1r). The probability tool puts those odds at 4%. Does that mean someone’s tree is wrong?

No. When evaluating a hypothesis, especially if the probability percentage is low, it can be helpful to scroll down to the big chart below that shows data from the Shared cM Project—described by creator Blaine Bettinger as “a collaborative data collection and analysis project created to understand the ranges of shared cM associated with various known relationships.” [See Blaine Bettinger, “Version 4.0! March 2020 Update to the Shared cM Project!” blog post 27 March 2020, The Genetic Genealogist (https://thegeneticgenealogist.com/2020/03/27/version-4-0-march-2020-update-to-the-shared-cm-project/ : accessed 4 Aug 2021).]

While the percentages at the top of the Shared cM Tool webpage come from theoretical data, these Shared cM Project numbers come from empirical data. Think Leonard, on The Big Bang Theory.

Blaine has collected over 60,000 match-pair results so far (and still collecting more data!) He has  captured the reported relationship of each match and plotted the average shared cM amount for that relationship and a range of valid values. If you have input the size of the match into the probability tool above, this Shared cM Tool will fade the relationships not supported, and allow you to click the box you are interested in.  See image 6.

Image 6.

Notice that half-relationships, up to but not including half-4c, appear on the left.

In this case, if you click on 2c1r, you get the pop-up shown in Image 7.

Image 7.

The data for matches reported as 2c1r fall in an expected bell curve. A match of 39 cM sits on the shoulder of the curve. It shows that 500 matches were reported for 2c1r that shared between 26 and 50 cM. Our 39 cM match is not near the average for 2c1r, but it’s on a reasonable part of the bell curve shoulder; plenty of participants reported similar results. 2c1r is a credible prediction for that match.

If two test takers shared, say, 330 cM and were allegedly 2c1r, you can see that such a result would be much farther down a shoulder. The project only had 10 match pairs sharing 326-350 cM for reported 2c1r cousins. With an acknowledged risk that project data is self-reported and some entries may be flawed, you might want to challenge your hypothesis for 2c1r for a 330 cM match.

NOTE: the “Beta with updated probabilities” link described above only applies to the probabilities part of the tool. The Shared cM Project part of the tool is the same content whether using the default or beta probabilities.

I would encourage everyone using the Shared cM Project data to read more about the project here: https://thegeneticgenealogist.com/2020/03/27/version-4-0-march-2020-update-to-the-shared-cm-project/ . This takes you to a post that describes the methodology in detail, provides a link so that you can contribute your own match data to this invaluable crowd-sourced initiative, and it also offers a link to a PDF .

Here are some highlights from the PDF: Page 5 offers step-by-step guidelines on how to use the data. Page 6 defines meiosis groupings, which is another explanation for why some relationships are placed in the same buckets as other relationships. And page 7 contains a chart of those buckets or groupings, including average shared cM, and min/max range for that relationship group.

This chart also includes half-4c relationships (which the colorful online chart on the Shared cM Tool omits, for space reasons). If you want to estimate data for more distant half-relationships, it is fairly safe to treat half- relationships like once removed. For example, half-5c would be in the same bucket as 5c1r.

Histograms like those shown in image 7 above are also available on later pages in this PDF.

The Shared cM Tool carries three important notes:

  • Relationships more distant than second cousins may not share any measurable DNA. The more distant the relationship, the more likely there will be no match.
  • Averages shown do not include those zero cM match results in their calculations, only positive values.
  • Endogamy or pedigree collapse can affect expected results; the statistics on this page assume neither applies.

I hope everyone is finding the Shared cM Tool as valuable as I do, and I hope this blog post might enlighten some readers and perhaps inspire some to explore all the content the page has to offer.

Many thanks to Jonny Perl for sanity-checking a draft of this post and suggesting some clarifications. And for providing us with this valuable tool!

Happy sleuthing!

ETA 18 Sep 2021: Debbie Parker Wayne has a great blog post on this too; hers goes into more detail on what you can do if the probabilities you are seeing are lower than expected. See Debbie Parker Wayne, “DNA Painter’s Shared cM Tool — Ranges, Probabilities, and Histograms,” Deb’s Delvings in Genealogy, 11 September 2021 (http://debsdelvings.blogspot.com/2021/09/dna-painters-shared-cm-tool-ranges.html : accessed 18 Sep 2021).

© September 2021, Ann Raymont, CG®

4 thoughts on “Shared cM Tool revisited

  1. smpfamily

    Since few of my DNA match amounts had changed, I had not paid much attention to the differences. It is good to learn that others are, and the numbers matter. It gives me confidence in the probabilities. WATO is most interesting for me when showing that a particular relationship is not possible, statistically speaking.

    Liked by 1 person

    Reply
  2. Pingback: Friday’s Family History Finds | Empty Branches on the Family Tree

  3. Pingback: Best of the Genea-Blogs - Week of 29 August to 4 September 2021 - Search My Tribe News

Leave a comment