I struggled for several days to find an easy way to convert my sequence capture data to TreeMix format, but it turns out, it is not that hard after all. The general idea is to get your SNPs into vcf format, then use the populations script in STACKS to convert vcf to TreeMix. It’s that simple. Ignore all those really useless threads out there on the internet. If you still aren’t sure how to proceed, read on!
- Call SNPs
I had multiple sequence alignments from target capture data, and the first thing I needed to do was call SNPs. The key here is that you can only have one SNP per gene (not per exon), and it has to be biallelic (only two alleles). To do this, I used EAPhy, an awesome pipeline written by Moz Blom (https://github.com/MozesBlom/EAPhy). With a little hacking, I managed to make the SNPselection script work for my alignments. You have to edit the EAPhy_config.phy, and then follow the error messages to know what directories and files to create. If you have problems, email Moz! The pipeline is really designed to start at the beginning, but it will work for SNP calling from premade alignments.
- Convert to VCF
First I converted my phylip file from EAPhy to fasta in Geneious because PGD Spider likes the fasta better for SNPs for whatever reason. Then I used PGDSpider2, which I used a lot for this sort of thing, to convert the fasta files to .ped files for Plink. I then used plink to convert my ped files to VCF files, and then used vcftools –vcf input.vcf –recode to recode my files into a vcftools format. This was kind of the long way around, but I couldn’t get PGD spider to convert directly to VCF from fasta (it only converted 50 loci), and the VCF output from plink or PGDSpider can’t be read by STACKS. So, recode it in vcftools to read it into STACKS.
- Convert vcf to TreeMix in Populations (STACKS)
To convert with Populations you need a .map file, which just has sample \t population for each sample. And that is it! The command I used is populations -V file.vcf -M map.txt -O . –treemix
- Run TreeMix
I wrote a script to automate running TreeMix for each migration value (1-user defined limit. I am also working on summarizing the likelihood to choose the best run for each migration value. These can be downloaded on my GitHub at (https://github.com/kyleaoconnell22/TreeMix-scripts.git).
- Color the tips when plotting
I found this post to figure out how to do this (https://groups.google.com/forum/#!topic/structure-software/A9RrF35p_ZU), but the idea is to open the plotting.R script and hard code your population file into the plotting script so it can assign colors to the species.