Estimated dating of Y-chromosome events

(by James Dow Allen)

Information in the following chart is taken from

Click here for a close-up of R1b based on this on-line chart.

The Francalacci phylogeny tree seems to be one of the widest-coverage Y-chromosome trees shown on-line. Assuming that the rate of SNP mutations is almost constant and has been estimated, rough date estimates of clading splits can be read directly from his SNP chart. I've done that here, with my own arbitrary estimate of mutation rate imposed.
 

The chart is a modified version of Figure 1 of the Francalacci paper. I have made only the following modifications to that chart:

The SNP counts shown in a chart like this will vary from study to study (especially when ancient skeletons are involved) due to genome reading difficulties. One might expect, I think, that SNP counts be proportional. Here they are not: SNP distance from R-L to R-Q is 119 from Francalacci but 39 (1/3 as much) from Raghavan. But R-Q to R1a-R1b distances are 43+38 from Francalacci and 19+5+18 (1/2 as much) from Raghavan. Is this within normal statistical variation? Or is my interpretation flawed?

The SNP distance from an R* to the Siberian boy is shown as "35 private non-N" in the Raghavan Figure. Am I correct to compare this count with the "a"+"d" counts shown for modern genomes in the Figure?

The date estimates I provide are not justified by any specific evidence and should be treated as "wild guesses" just to start discussion. However, I think they may be about right. The sudden emergence of haplogroups C, F, D and E from BT appears just before the alleged Toba supereruption. The rapid division of F into G,H,...,T coincides with a plausible population expansion near West or South Asia. Furthermore, this estimate places the Western Europe split-up of R1b-L11 near 3000 BC, about the date of Corded Ware intrusion into Western Europe and/or the date of Bell Beaker expansion.

Francalacci calibrates the mutation rate of his genome subset as 205 years per SNP. I have used 165 years per SNP for the years I show.

Click here to view the same chart, but with the time axis showing 140 years per Francalacci SNP, instead of 165 years. Now the BT breakup occurs after the Toba superuption, perhaps more plausible.

Although Francalacci considers an ancient skeleton (Ötzi the Iceman), coverage of that DNA was inadequate to place it accurately, so he calibrates his mutation rate by assuming the sudden fanout of I2a1a1 in Sardinia coincides with the sudden expansion of farming on that island. I think it's just as likely that the fanout was associated with a later "secondary founder" event. (As his own paper may admit the 205 year estimate represents more of an upper bound than an expected value for the Ötzi data.)

Ignoring statistical variation, the mutation rate can be read directly from the Raghavan chart. For example, 35 "private non-N" SNPs separate the Siberian boy from a defined node; 94 "d"+"a" SNPs separate a modern R2 speciman from the same node. The difference of 59 (94-35) divided by 24000 years gives 407 years per Raghavan SNP. If we use R-L to Modern R2 to calibrate the two SNP rates; Francalacci 119+43+209 (=371) compared with Raghavan 39+24+89 (=152), and divide 407 years by (371/152) we conclude that the rate (normalized to Francalacci's subset) implied by Raghavan data is 167 years per SNP. Obviously this calculation should be averaged over various paths, is fraught with statistical uncertainty and should be done by someone who understands Raghavan data better (perhaps using "d"+"a"+"N" counts instead of "d"+"a").

The path-lengths to present are consistent enough in the Raghavan data to suggest that its sampling size is ample enough to draw conclusions. Proceding in a different way, the distance from F-breakup to Siberian boy and to present are 98 and 155 SNPs respectively, so the F-breakup should be (24000 * 155/(155-98)) years ago, or 65000 BP.

Thus, despite caveats, I think that, interpreted with more understanding than I have, the Raghavan data should be adequate to form a good ballpark estimate of mutation rate, and to estimate the date of the R1-R2 split.

As far as I can tell, Y-chromosome SNP rate is still controversial, with estimates typically 100 to 200 years, when normalized to Francalacci's subset. (In round figures, I think Francalacci's subset is about 17% of the NRY-chromosome, so his 205 years per SNP becomes 35 years per SNP for the entire NRY.)