Wednesday, February 15, 2012

Correspondence between ChromoPainter clusters and ADMIXTURE components in Balkans/West Asia

I took the 25 different inferred clusters from my recent ChromoPainter analysis, and calculated their normalized median components in terms of the K12b calculator. This is a quite useful exercise, since it can show in what sense clusters are different from each other.

Here are two ways in which you may use this correspondence.

1. Different clusters of a single population

For example, the Turks with partial Balkan ancestry tend to belong to pop10, whereas those of Anatolian ancestry to pop13, and those from northeastern Anatolia to pop22. If we compare the admixture proportions of these three groups, we notice e.g.,

  • An excess of Atlantic_Baltic and North_European in pop10
  • An excess of Caucasus in pop22
Or, there is a group of 5 Iranians that belong to pop12, whereas the overwhelming majority of Iranians and Kurds belong to pop21. Strikingly, pop12 differs from all other populations in having substantial levels of East_African and Sub_Saharan. So, it seems that fineSTRUCTURE was able to infer that some Iranian individuals had this feature in common. These individuals were already evident in the Iranian population portrait (right), but fineSTRUCTURE was able to group them even though there were no African populations in the ChromoPainter analysis; presumably, the software was able to detect that these individuals shared a set of chunks that were quite different than is the norm for the Balkan/West Asian area.

2. Related clusters

fineSTRUCTURE grouped the different populations in a tree structure. For example, it grouped pop18, the "North Balkan" cluster with pop23, the "Bulgarian-Romanian" one.

Looking at the admixture proportions, we can tell that the two clusters do indeed seem quite similar, but there are some differences, e.g., an excess of North_European in pop18, and an excess of Caucasus in pop23. This makes sense given the geographical origin of individuals belonging to the two clusters.

Tuesday, February 14, 2012

ChromoPainter/fineSTRUCTURE analysis of Balkans/West Asia

I have carried out a ChromoPainter/fineSTRUCTURE analysis of Balkans/West Asia. This is a slightly different dataset than the one used in the previous fastIBD analysis of the same region. It also took much longer (about a week, with two CPUs dedicated to the task) to complete, so it is not something that can be done routinely.

Technical details (skip if you want)

413 individuals from 33 populations were studied, on 258,100 SNPs, after --geno 0.03 --maf 0.01 filters were applied. Data were phased in Beagle with the default 10 iterations. Genetic maps from the HapMap were used. fineSTRUCTURE was used on ChromoPainter output, with 500,000 burnin/runtime iterations each.

25 Inferred Populations

fineSTRUCTURE imposes a tree structure on a number of inferred populations. The following heatmap shows this tree structure; columns represent donor populations, rows, recipient ones.

There was a total of 25 populations, labeled pop0, pop1, ..., pop24.

The following table summarizes how many individuals from each original population were assigned to each inferred population:

I will limit myself to populations which include Dodecad Project members:

  • pop6 includes a Project North Ossetian, as well as all Yunusbayev et al. North Ossetians
  • pop7 is mainly Armenian
  • pop16 is also mainly Armenian; it would be interesting to see whether this bipartite division of Armenians is in agreement with the one inferred in the previous fastIBD analysis
  • pop8 is mainly Greek, and appears to be "continental Greek"; it also includes some other Balkan individuals
  • pop14 is also Greek, and includes a variety of people with ancestry from Crete, the Aegean, Cyprus, Asia Minor, Cappadocia, and the Pontus as well as continental Greek. It could be labeled "eastern Greek"
  • pop11 is Cypriot, including the single 100% Greek Cypriot of the Project, all 3 100% Turkish Cypriots, as well as a Turkish individual of partial Turkish_Cypriot ancestry
  • pop10 is Turkish, and includes people with some ancestry from the Balkans, as well as Anatolia. It could be labelled "Balkan Turkish"
  • pop13 is also Turkish, and seems to include people with ancestry exclusively from Anatolia, including almost all the Behar et al. Turks
  • pop15 is Assyrian; some Assyrians also fall on the aforementioned pop16 which includes mainly Armenians
  • pop18 could be labelled "North Balkan"; there is probably structure to be uncovered within this cluster, once more participants from the Balkans join the Project
  • pop20 is "Georgian-Abkhazian"
  • pop21 is "Kurdish-Iranian"
  • pop22 could be labeled "Northeastern Anatolia" or (more classically) "Pontus-Colchis". It appears to unite various individuals from Northeastern Turkey and neighboring Georgia, having Karadeniz Turkish, Armenian, Pontic Greek, and Kartvelian ancestry. I strongly encourage participants from this region to join the Project, especially Pontic Greeks, as there are no 100% Pontic Greeks currently in the Project.
  • pop23 is "Bulgarian-Romanian" mainly, and also includes one Serb. Once again, I emphasize that the power of this approach using haplotypes depends on participation, so I encourage all people from the Balkans to consider joining the Project.
Principal Components Analysis

I have also used the PCA feature of fineSTRUCTURE to carry out principal components analysis. I am plotting the first two dimensions of this PCA, using my own visualization code that places labels in the average position on the plane:


Results for Project participants are included in the spreadsheet.

  • Population matrix, shows how many individuals from each population were assigned to each cluster
  • Z score population matrix, shows the normalized number of "chunks" from each donor population (columns) to each recipient (row). Do not compare across rows! The way to read this table is the following: for each row, higher values indicate more sharing. For example, the "Cypriots" population has pop11 as its main donor.
  • Individual assignments: the pop number that all Project and reference IDs were assigned to
  • Individual Chunkcounts: the number of chunks copied from its donor population (column) to each individual
  • Individual PCA: your PCA co-ordinates that can help you find your dot on the Principal Components Analysis graphic (see above)
Averaged results were included only for populations with >=5 members.
The raw chunkcounts for all 413x413 individuals can be found here.

Monday, February 6, 2012

Other testing companies

The Dodecad Project is not affiliated with any genetic testing companies. Until now, I have included Project participants from 23andMe and FamilyTreeDNA "Family Finder" tests, but it has come to my attention that there are new players in the field, such as (see post on Your Genetic Genealogist) and Lumigenix (see post on GenomesUnzipped).

If you have data from any company entering this field, please contact me at (do not send data right away!). That way, I can find out how many markers are in common between the new tests and my existing datasets, and figure out how easy it will be to convert them for use in the Project and in DIYDodecad.