Sunday, July 3, 2011

A first quick effort

Let's see if this works! Please click on the picture below, and start those comments rolling in. What should we do to improve this graph, and what does it mean?

Note: the haplogroup assignments are by FT DNA and include their predictions.

The data here is just our project data, but:-

  1. I combined STR and SNP data (two files that FT DNA's controls create for admins) into one sheet, lining up the data.
  2. I organized the countries and counties and invented the regions, which means I also sometimes corrected what people had down as their COUNTRY of origin, because they are often wrong. A lot of people apparently don't know which country some counties are in, or else they were hedging bets. I have assumed their COUNTY information is correct, because that is the data we always push people to double check in our project.
  3. I ran my own haplogroup prediction using Whit Athey's tool, but I have not used that information much yet.
  4. I removed everyone without a pedigree to a county. Maybe that was the most important step!
  5. I created a frequency table using pivot table functions, which I have e-mailed already to both of you, and a graphic representation of that frequency table, which now appears on the blog. (Two work sheets in this spreadsheet.)
  6. I created a short version of the haplogroup names so that they all line up and look the same, not depending on the SNPs tested.
And here is the data:-

And here are the regions I have used, in order to get big enough data sets, of people with pedigrees back to old counties:-

Here are a few first remarks:-
  • G levels highest in Wales in the northern part of the Republic of Ireland. Remember that people are now saying this is a Neolithic farmer (pre Celtic) marker, based on the relatively large number of G men found in old archaeological sites.
  • I2a in interesting patches: western Ireland, most of Scotland, NE England, and the extremity of SW England, but apparently almost absent in many areas neighbouring on these, like SE Scotland, NW England, and the counties neighbouring the extremity of the SW of England.
  • I2b almost invisible in southern Ireland and Wales, but high frequencies in southern Scotland, northern Ireland and also common in most of England.
  • I1a pretty common everywhere except in western Ireland, but if it is Anglo Saxon you would expect it to be higher in SE Scotland?
But I have to say that I haplogroup prediction from FT DNA and also from Whit's predictor can probably be improved upon. I have contacted the obvious people: Jim Cullen and Ken Nordtvedt. I haplogroups perhaps deserve their own post in the near future.

No comments:

Post a Comment