We should actually do something

My thanks to Ken Nordtvedt for writing to ask for data from this project in a specific format he wanted. It is a useful push for action.

It has to be said that we are still in a position of having a big mess of data which is difficult to use. Ideas on how to improve this are welcome.

One big issue is that we have an enormous number of participants who have no known male line connection to the British Isles at all. Many even know they are from somewhere else. The sheer number of such members does make all jobs difficult in my opinion, although I understand that people want to have their Y DNA in the database just to feel a link to the project.

Keep in mind that there is also an even bigger amount of people who are members but only believe that their ancestors were from the British Isles, not which country, or perhaps they know which country, but no more. And I am sure all experienced genealogists agree with me that we can expect most of these people are reporting what are essentially guesses. (Many family stories are just the guesses of a previous generation.)

Anyway, enough complaining. One thing we have long aimed to do is to create haplogroup frequency data in a more user friendly format. I am going to get to work and at least do some preliminary work.

To start with I've just made an excel sheet where I've deleted all people who have not reported a clear county of origin in Britain and Ireland. That makes it much easier! Only about 1500 people!

So I would like to ask for opinions on how to divide up the populations in terms of haplogroups? Many participants have of course not been tested for any SNPs, while some have been tested for all the latest new ones.

I am supposing I'll need to run part of the data through a prediction program like Whit Athey's. Should I also ignore all people with less than a certain number of markers?

Of the approx 1500 people who know a county of origin in their male line, a bit more than half only have a predicted haplogroup the way they appear in the FT DNA data. About 740 have had real SNP tests.

