Serious concerns edit

What is the validity of this article? For one thing ethnic groups are not biologically valid populations. Is the criterion an ethnic group or a nation? These are not necessarily the same thing. What about groups that have been sampled numerous times from numerous different populations, each of which has a different frequency of haplogroups/haplotypes? Is it OR to pool samples from different studies? Can we do this when different studies genotype for different SNP mutations (and when the mutations that define haplogroups change constantly)? I've had a look at the Welsh data and they are not accurate at all. Wilson et al. (2001) is cited as a source for 89% of Welsh samples (Anglesey) being R1b, but Wilson et al. (2001) didn't genotype for R1b (the marker M343, currently used to define R1b was probably not available at that time, ad even if it was, it didn't define R1b at that time, for a time P25 defined R1b), this group probably represents P(xR1a1a) (i.e. a genotyping for 92R7 (haplogroup P) and then a genotyping of this set for M17 (R1a1a), so this "R1b" is actually "samples +ve for 92R7" minus "samples +ve for M17"). For another thing it states that these samples are derived from Anglesey, but that only applies to the samples cited for Wilson et al. (2001), the haplogroup I samples cite Rootsi et al. (2004), but Rootsi cites Capelli et al. (2003) for these samples, and Capelli et al. (2003) collected samples from three Welsh populations Llangefni, Llanidloes and Haverfordwest, Rootsi et al. (2004) have pooled these data to produce a large meta-sample. The original three populations were not identical in the proportions of haplogroups. Furthermore there is an additional paper that sampled from Wales, Weal et al. 2002 sampled from Llangefni and Abergele. These papers have genotyped for different Y-SNPs because different SNPs are available at different timepoints. There are two ways we can show these data, we can either pool all "Welsh" data from the three different studies (Wilson et al. (2001), Weal et al. (2002) and Capelli et al. (2003)) and pretend that these have some biological meaning, but this produces the problem that in Weal et al. (2002) (and Wilson et al. (2001)) Hg 1 is actually P(xR1a1) (genotyped for 92R7 excluding subgroups of SRY1532.2/SRY10831.2 which defines group R1a1), which is not really R1b, though most of it's members probably belong to R1b. On the other hand Capelli et al. (2003) genotype for M173 (R1) and M17 (R1a1a), which produces haplogroup R1 (xR1a1a), again it's not really R1b, though most of it's members probably do belong to R1b. Personally I think this is very close to OR if we claim that these are R1b, who can defend this? If we do this are we then saying that a nation/ethnic group is a biologically valid population? Effectively we are then claiming that this population is homogeneous. But we already know that it is not homogeneous. On the other hand we can display the data separately, we can have a Welsh (Abergele) row, and a Welsh (Llangefni) row and a Welsh (Haverfordwest) row etc. That would be more valid, but then these populations are not ethnic groups, and it doesn't really solve the problem of the haplogroup designation, it would be better to have a list of SNP markers genotyped rather than a list of haplogroups. After all haplogroups change constantly, usually at least once a year at the present rate, but the SNP markers do not change, only that the haplogroups that the marker defines changes. So if M17 has been genotyped and is present, it is always there, irrespective if M17 defines haplogroup R1a, or R1a1 or R1a1a (and it has been used to define all of these groups at different periods of time). Actually this article rather scares me, it's trying to pretend that ethnic groups are biological units, something I think most anthropologists would be very sceptical about.

So I suggest that we have a rethink. I support having a list of populations (however they are defined, in the paper being cited) and the frequency of any defined mutation within that population. That would be less confusing, would be less open to OR and would be more transparent.

E.g. edit

Population Language n[1] SRY10831.1 SRY4064 M35 M89 12f2.1 M172 M170 M26 M9 Tat 92R7 M173 M17
Llangefni-1 [2] Brythonic 80 3.8 [3] 3.8 NT NT 1.3 NT NT NT 1.3 [4] 0 88.8 [5] NT 1.3
Abergele [2] Brythonic 18 5.6 [3] 38.9 NT NT 1.3 NT NT NT 0 [4] 0 55.6 [5] NT 0
Ashbourne [2] Anglic 54 22.2 [3] 5.6 NT NT 3.7 NT NT NT 0 [4] 0 64.8 [5] NT 3.7
Southwell [2] Anglic 70 18.6 [3] 5.7 NT NT 5.7 NT NT NT 0 [4] 0 64.3 [5] NT 5.7
Bourne [2] Anglic 12 33.3 [3] 0 NT NT 0 NT NT NT 0 [4] 0 66.7 [5] NT 0
Fakenham [2] Anglic 53 41.5 [3] 1.9 NT NT 0 NT NT NT 0 [4] 0 56.6 [5] NT 0
North Walsham [2] Anglic 26 30.8 [3] 3.8 NT NT 3.8 NT NT NT 0 [4] 0 57.7 [5] NT 3.8
Friesland [2] Frisian 94 34 [3] 2.1 NT NT 1.1 NT NT NT 0 [4] 0 55.3 [5] NT 7.4
Norway [2] North Germanic 83 44.6 [3] 1.2 NT NT 2.4 NT NT NT 0 [4] 3.6 26.5 [5] NT 21.7
Shetland[6] North Germanic 63 NT NT 0 0 [7] 0[8] 0 10[9] 0 0[10] 0 0[11] 66[12] 23
Orkney[6] North Germanic 121 NT NT 0 0[7] 0[8] 0 14[9] 1 0[10] 0 2[11] 64[12] 19
Durness[6] Goidelic 51 NT NT 0 0[7] 0[8] 0 14[9] 0 0[10] 0 0[11] 80[12] 6
Western Isles[6] Goidelic 88 NT NT 0 0[7] 0[8] 0 25[9] 0 0[10] 0 0[11] 66[12] 9
Stonehaven[6] Anglic 44 NT NT 0 2[7] 0[8] 0 13[9] 0 0[10] 0 0[11] 79[12] 5
Pitlochry[6] Goidelic 41 NT NT 0 0[7] 0[8] 7 10[9] 0 0[10] 0 0[11] 80[12] 2
Oban[6] Goidelic 41 NT NT 0 2[7] 0[8] 0 7[9] 0 0[10] 0 0[11] 86[12] 4
Morpeth[6] Anglic 95 NT NT 0 2[7] 1[8] 3 18[9] 0 0[10] 0 0[11] 73[12] 3
Penrith[6] Brythonic 90 NT NT 3 1[7] 0[8] 2 18[9] 0 0[10] 0 0[11] 68[12] 8
Isle of Man[6] Goidelic 62 NT NT 2 0[7] 0[8] 0 16[9] 0 0[10] 0 0[11] 70[12] 13
York[6] Anglic 46 NT NT 4 2[7] 0[8] 0 32[9] 0 0[10] 0 0[11] 57[12] 4
Southwell[6] Anglic 70 NT NT 6 0[7] 0[8] 6 28[9] 0 0[10] 0 0[11] 64[12] 5
Uttoxeter[6] Anglic 94 NT NT 4 1[7] 0[8] 4 18[9] 0 0[10] 0 0[11] 71[12] 2
Llanidloes[6] Brythonic 57 NT NT 5 4[7] 0[8] 2 19[9] 0 0[10] 0 0[11] 66[12] 2
Llamgefni-2[6] Brythonic 80 NT NT 4 0[7] 0[8] 1 4[9] 0 1[10] 0 0[11] 89[12] 1
Rush Goidelic 76 NT NT 0 0[7] 0[8] 0 8 3 0[10] 0 0[11] 86[12] 4
  1. ^ Sample Size
  2. ^ a b c d e f g h i Weal et al. (2002)
  3. ^ a b c d e f g h i Excluding 12f2.1, YAP and M9 and their sub-groups, making it Haplogroup BT(xDE,J,K)
  4. ^ a b c d e f g h i Excluding SRY465, Tat, 92R7 and M20 and their sub-groups, making it haplogroup K(xL,N1c,O2b,P)
  5. ^ a b c d e f g h i Excluding SRY1532.2/SRY10831.2 and it's sub-groups, making it haplogroup P (xR1a1)
  6. ^ a b c d e f g h i j k l m n o Capelli et al. (2003)
  7. ^ a b c d e f g h i j k l m n o p Excluding 12f2.1, M170, and M9 and their sub-groups, making it haplogroup F(xI, J, K)
  8. ^ a b c d e f g h i j k l m n o p Excluding M172 positive samples, making it haplogroup J(xJ2)
  9. ^ a b c d e f g h i j k l m n o Excluding M26 positive samples, making it haplogroup I(xI2a2)
  10. ^ a b c d e f g h i j k l m n o p Excluding Tat and 92R7 and their sub-groups, making it haplogroup K(xN1c, P)
  11. ^ a b c d e f g h i j k l m n o p Excluding M173 and it's sub-groups, making it haplogroup P(xR1)
  12. ^ a b c d e f g h i j k l m n o p Excluding M17 and it's sub-groups, making it haplogroup R1(xR1a1a)