When is a match a false positive?

“The best laid schemes o’ mice an’ men / Gang aft agley”, as the poet [Robert Burns] says.

In other words, herein lies what worked in my project to identify a good cM threshold for “false positives”… and what didn’t.

Just as many of our ancestors braved new frontiers, those of us who are trying to use DNA in our genealogy find ourselves in a new frontier too. There are some guidelines and best practices and methodologies that are widely recommended and followed, and some that are controversial or evolving. As citizen scientists, we are all welcome to conduct studies with our own family data and see how well it aligns with “what the experts say”.

One of the most basic questions in genetic genealogy is whether a matching DNA segment is “large enough” to investigate further. Small segments of matching DNA may be “false positives”. There are two primary reasons why an alleged match may be of no use:

  1. The segment could be passed down from an ancestor too far back to identify. Some refer to this as population genetics, i.e. a piece of DNA that everyone from a certain area may have gotten from a common ancestor several hundreds of years ago. AncestryDNA uses a proprietary algorithm to try to identify these “pileup regions” and remove them from our matches. The other companies at this time do not. This isn’t exactly an invalid match, but it’s considered a false positive because it’s not genealogically relevant.
  2. Genotyping errors may occur. DNA testing companies look at alleles at specific locations (markers) in our chromosomes, and compare those values to another tester’s DNA, to see if enough consecutive identical alleles constitute a match. There are several ways this can go awry;1 but the bigger the chromosome segment, the less likely these errors can produce a false match.

Each testing company employs its own algorithms and thresholds to minimize false positives and yet not omit genuine matches. The International Society of Genetic Genealogy suggests that “In general, the larger the shared segments the more likely that the match is genuine. Half-identical matching segments of 15 cMs are mostly IBD [Identical by Descent, i.e. valid for genealogy], and the majority of matches between 10 cM and 15 cM are IBD. As the predicted matching segments get smaller the false positive rate increases.”2

One way genetic genealogists can check to see if a match between A and B is genuine is to see if B also matches A’s mother or A’s father. (A doesn’t have any DNA that didn’t come from her mother or father. But a genotyping error could cause it to look like something different.) If B does not match either of A’s parents, then this can be a false positive match.3

At the Salt Lake Institute of Genealogy in January 2016, Angie Bush reported the results of her study on the DNA matches of a two-parent-one-child trio. With FamilyTreeDNA data, she found that 94-95% of A’s matches over 10 cM were valid, i.e. they also matched one of A’s parents. However, of the matches where the largest segment was 7-10 cM, nearly a third of them did not match either parent: 32.4% were false positives. Unlike FamilyTreeDNA, AncestryDNA “phases” the data to improve the quality of the match list. At AncestryDNA, Angie’s test found that over 97% of the matches over 10 cM were shared with a parent, and over 91% of those between 7 cM and 10 cM matched a parent. At 6 to 7 cM, there was a serious drop-off: more than 80% were false positives. (Thanks, Angie, for permission to share!)

As a fledgling citizen scientist, I wanted to test this within my own family. Previously, I hadn’t seen any value in testing my kids—they weren’t going to have any ancestors that their parents didn’t have. But now I was tempted to see what having a two-parent-one-child trio could tell me. I asked Dan if he would donate a DNA sample to the cause and he agreed. His father and I had already tested at AncestryDNA.

And here’s where my plans “gang agley”. AncestryDNA introduced their new V2 chip after I mailed in Dan’s DNA sample but before it was processed. In basic terms, this means that previously, with the V1 chip, AncestryDNA was looking at about 700,000 SNPs (Single Nucleotide Polymorphisms, i.e. markers or locations) on the chromosomes and comparing them to the same areas in the chromosomes of other testers. Now, though, AncestryDNA is looking at about 460,000 of the old SNPs and another 200,000+ new SNPs.

So it may be hypothetically possible for Dan to have an IBD match (Identical by Descent, i.e. valid for genealogical purposes) with another recent tester who also tested on the V2 chip, and that person might not match Dan’s parents, because the SNPs where they match were not processed for Dan’s mom or dad.

Using the DNAGedcom client,4 I downloaded all of Dan’s matches. He had 4630 matches over 6 cM. Then I downloaded the parents’ matches and used Excel to strip out the duplicates, where Dan’s match also matched one of his parents. The result?

                                                    2-Parent 1-Child DNA Analysis
size of match # of Dan’s matches # that don’t match parents % false
all 4630 1635 35.3 %
15+ cM 232 4 1.7 %
10-15 cM 690 33 4.7 %
7-10 cM 1846 453 24.5 %
6-7 cM 1862 1145 61.5 %

It’s important to remember that some of those false positives could hypothetically match Dan if they tested on the V2 chip and matched Dan on SNPs that were not processed for Dan’s parents. I randomly messaged a dozen of these “false positive” matches over 10 cM. So far, only one has replied—and she confirmed that her DNA test was new, apparently with the V2 chip. It is premature to argue that this is an IBD match; more research is needed. But it may be possible. (ETA July 24, 2016: I heard from another match between 10-12 cM: she tested on the old V1 chip in January 2016 and matched Dan but neither parent. This is most likely a false positive.)

It’s not likely that most of the false positive matches in the chart are truly genuine due to the chip issue. I downloaded Dan’s data less than a month after AncestryDNA rolled out the V2 chip; only those testers who also had their DNA sampled that month on the new chip might be valid matches. In addition, those individuals would need to match Dan in the new SNP regions. Finally, no genotyping error must have occurred—and at this time, I can’t determine that without parent matches. Nevertheless, the chip change could mean that Dan’s false positive figures appear more pessimistic than actually warranted. This could be a reason why Angie Bush’s results with AncestryDNA data were more promising.

It’s also conceivable that some of Dan’s “false positives” are really a “false negative”in a parent, i.e. the parent really does match Dan but some processing glitch on the parent’s kit makes it look like a mismatch. Since my goal is to identify a threshold where I am confident the match is reliable, I’m not going to worry about those.

Conclusions?

As noted at the beginning of this article, a false positive may be due to a DNA segment passed down from outside a genealogical timeframe, e.g. due to population genetics. AncestryDNA’s process attempts to mitigate this issue. The second problem is when genotyping errors make it appear that A and B match, when in fact, B doesn’t match either of A’s parents there, and therefore the match isn’t valid. Is there a certain match size threshold where we can rely on the match being IBD?

  • Like the ISOGG wiki, I’m very confident about matches over 15 cM. All of Dan’s matches over 15.5 cM matched a parent, regardless of the chip. The four matches between 15 and 15.5 cM that didn’t match a parent did log in the first week of June and may have matched Dan on the V2 new SNP regions. In any case, at least 98% of Dan’s matches over 15 cM matched a parent and appear genuine. This would be all projected relatives at AncestryDNA rated 4th-6th cousins and closer, and some distant cousins.
  • Between 10 and 15 cM, Dan’s AncestryDNA matches did match a parent over 95% of the time, regardless of chip. This is likewise worth my time to pursue, although the common ancestor may be further back than my tree goes.
  • Between 7 and 10 cM, nearly 75% of Dan’s matches appear genuine, and that figure could be low due to the chip change. In Angie Bush’s study, she found over 90% of the AncestryDNA matches in this range were genuine. At this time, I choose to pursue these only if they appear in a network (In Common With, Shared Matches, Triangulation Group, etc.) with other matches on a brick wall I’m currently investigating. With matches this size, I don’t assume it’s a reliable match, but it’s worth exploring further.
  • Between 6 and 7 cM, Dan’s results indicated these matches are more likely to be false than genuine. Angie’s AncestryDNA results were that they were about 20% likely to be genuine. Each genealogist has to decide for himself/herself whether these are worth chasing, when a fair portion of these predicted matches are actually false; there is no one right answer for everyone. I don’t invest time on these.
  • These conclusions apply only to AncestryDNA results. While I have immediate family who have tested at FamilyTreeDNA and 23andMe, I don’t have two-parent-one-child trios at those companies.
  • It may be possible for someone to match a child but neither parent and still be a legitimate match—at least, if the child and the match tested on a different chip than the parents, and matched on the different SNP regions that weren’t used in the parents’ chip. There still may or may not be genotyping errors on these matches, esp. on the segment matches under 10 cM. More research is needed.

My family case study didn’t really change my thresholds for useful segments, but it gave me confidence that what some experts are recommending for reliable match size is in line with what I have seen in my own family. But it would have been more clear-cut if I could have had Dan’s DNA processed with the V1 chip! [mental hashtag #darn.that.V2.chip.timing!]

Citations

1. For example, see Ann Turner, “Satiable Curiosity: Identity Crisis: Identical by State or Identical by Descent?”, Journal of Genetic Genealogy, Fall 2011, vol. 7, (http://www.jogg.info/72/files/Turner.htm : accessed 7 July 2016).

2. See International Society of Genetic Genealogy Wiki, “Identical by Descent” (http://isogg.org/wiki/Identical_by_descent : accessed 25 Apr 2016) for more information.

3. See International Society of Genetic Genealogy Wiki, “False positive matches” (http://isogg.org/wiki/Identical_by_descent#False_positive_matches : accessed 25 Apr 2016) for more information.

4. Rob Warthen, DNAGedcom,“Welcome to the DNAGedcom Client,” (https://www.dnagedcom.com/doc/welcome-to-the-dnagedcom-client/ : accessed 4 Jul 2016).

Ann Raymont © 2016

11 thoughts on “When is a match a false positive?

  1. Pingback: Citizen Science in genetic genealogy | DNAsleuth

  2. Sheila Irene Hughes Crone

    You are a scientist. I’m just beginning to understand cMs. I am understanding that I don’t know which company to believe. There is no simple answer and with all of the sales tactics pushing ‘civilians’ to test more and more I feel that I can never keep up with bonafied scientists. So what if I get my DNA done? What does that tell me, if anything? I’ve uploaded to a total of four companies and I’m told different things. Some tell me a DNA Circle means nothing even though I recognize many in the circle. FTDNA tells me Ancestry has used skewed files while other companies accept them. Not a problem for advanced researchers, but a huge problem for novices.

    Like

    Reply
  3. dnasleuth Post author

    Hi, Sheila! I know, it can be overwhelming for ‘newbies’. Sorry you’re finding it so frustrating. Have you read Blaine Bettinger’s book, The Family Tree Guide to DNA Testing and Genetic Genealogy? You may find it helpful.

    Ancestry’s DNA circles *are* worthwhile. (Unless some in the circle have mistakes in their trees.) Their ‘new ancestor discoveries’ (which are in beta test) aren’t reliable, but do use the circles! And FamilyTreeDNA does accept transfers from Ancestry again, so I’m not sure what you heard about skewed files. I think all three of the main companies are pretty accurate with their matches with at least one segment over 10 or esp. 15 cM.

    Your DNA results should connect you with biological cousins, who might know more about your shared ancestor and his or her line than you do. Or the match might give you new places to look and surnames to consider. It definitely takes time to get a handle on all this, but I hope you soon find it worthwhile – and fun!

    Like

    Reply
    1. Sheila Irene Hughes Crone

      Thank you for replying. I was feeling decidedly overwhelmed. FTDNA are the ones who said they could not accept my upload from AncestryDNA because much of my file was skewed and offline. AncestryDNA said it was not so when I showed them Ftdna’s emailed response. Naturally, FTDNA said Ancestry was wrong. So I dropped the matter and uploaded to My Heritage and then to GEDMATCH. I’ve had somenissues for using an iPad Pro. Ancestry doesn’t work well with the Safari browser which most Apple products use. They wanted me to get another browser. I did. GEDMATCH just let
      me know about the screen resolution. Just technical issues. AncestryDNA has improved its ability to use mobile devices, but on one of the sites for Tips and Help for beginners through professional, some hotshots were disagreeable about Ancestry and some others “dumbing down” for mobile users. I think that got taken care of. Overall, people are very helpful and kind such as yourself. So thank you again. Sheila

      Liked by 1 person

      Reply
  4. Pingback: How reliable is our DNA evidence? | DNAsleuth

  5. Thomas Wilson

    I have found two matches at 12.6 and 7.3 cM to a third person on MyHeritage. Two are documented cousins by a 1728-1790 ancestor in Virginia. The 3rd person is a descendant of the Virginia ancestor’s possible brother. Assuming Is it fair to estimate the probability of these matches being false as ~0.05*.0.25=0.0125? This implies a confidence level of -99% That they are IBD?

    Like

    Reply
  6. dnasleuth Post author

    Hi! I would say that — based on my own experience — the 12.6 cM segment has a 95% likelihood of being IBD. I wasn’t clear if you are saying that the 7.3 cM match is within that same segment, i.e. the three people triangulating? In any case, I’d be cautious about drawing conclusions when a hypothesized ancestor is that far back. The 7.3 cM match could still be IBS. There’s not enough evidence that the 12.6 cM segment came from the hypothesized common ancestor and not someone else entirely on a different branch; perhaps one that isn’t documented back that far.

    Like

    Reply
  7. Thomas Wilson

    Really appreciate your rapid reply. I’ve been sleuthing this connection since 2011 on and off. No triangulation, the segments are different. Brief background (the two matches in question are my American cousins who are verified to descend from our common 4/5 great grandfather, one Joseph Howe (SW VA, c1728, 1790). According to a genealogy text “Family of Hoge” by James Hoge Tyler (VA Governor 1898-1902) a great-grandson of Joseph Howe, Joseph was the brother or 1st cousin of the English Viscount Howe’s, Richard and William, who led the British military forces during the American Revolution. Joseph was on the American side (a documented young surveyor and friend of George Washington during 1749 in VA). This may explain why family relations were severed with no documentation of Joseph’s birth even. I’ve recruited the descendant of Richard Howe, the current 7th Earl Howe to participate in my research. He allows me to manage his MyHeritage site. I’ve built a tree using many of Joseph’s descendant branches (I descend from his daughter) and have inserted as his father, Richard Howe. Then I’ve added all of the links over and down to Earl Howe. Finally, I’ve used “SmartMatch” filter. I find ~23 matches but only two that have replied to my messages and for which we’ve verified their trees (They descend from two other daughters of Joseph, Anne Howe Pearis/Paris and Elizabeth Howe Hoge). One is in her 90’s (and if my assumptions are correct, is the 6C of Earl Howe, and the other is in her 60’s, as a 6C1R. The Earl will soon be on Ancestry (10x as large a database as MH) and then I hope to find more – possibly using their Thru-Lines filter (provided any matches would have trees one earlier generation than Joseph. Earl Howe’s pedigree of course extends many generations earlier.

    Liked by 1 person

    Reply
  8. Pingback: Small cM DNA Matches | Heartland Genealogy

  9. Pingback: a serendipitous match | DNAsleuth

Leave a comment