The government is still shut down…so here is a post on ancient hybridization
A lot of recent attention has been given in high profile journals to empirical examples of ancient hybridization. In this post I give a short background on the issue, highlight two recent papers, and then outline a few of the methods used to identify lineages affected by or produced by ancient hybridization.
Sometimes evolution proceeds in a manner not well represented by a bifurcating tree. Two processes can lead to the inference of incorrect phylogenies, incomplete lineage sorting (ILS) and hybridization. ILS is when gene trees do not match the species tree, in other words, the individual genes do not evolve (accumulate mutations) as quickly as the species’ carrying those genes. Hybridization on the other hand occurs when two different species reproduce, and a portion of the genome is exchanged, sometime bi-directionally and sometimes uni-directionally. While this has been extensively documented between recently diverged (closely related species), only recently have studies started to focus on more ancient hybridization, or deep-time hybridization/reticulation.
Recently, a number of papers have explored this issue using a number of new-ish methods. These are not the only papers to explore this theme, but I picked three that I recently read to highlight here. First, Burbrink and Gehara 2018 estimate ancient hybridization between kingsnake lineages. Specifically, they use the program SNaQ, which can estimate hybridization while accounting for ILS. The method works using concordance factors estimated from gene trees. Concordance factors are numerical values representing the proportion of gene trees that support a given phylogenetic scenario (or phylogenetic relationship). This method can help differentiate between ILS and hybridization because a purely bifurcating tree (with or without ILS) will produce high concordance factors for the species tree scenario. Hybridization on the other hand will produce mostly intermediate concordance factors as the hybrid species will group with the parental species roughly equally. This is because in this case different parts of the genome come from different parental lineages, and thus different parts of the genome support different evolutionary scenarios. Thus, some subset of the gene trees will differ from the species trees. SNaQ uses a pseudolikelihood function (don’t ask me what that is) to pinpoint the location of hybridization and give the probability of each hybridization event using the concordance factors. In the case of this paper, Burbrink and Gehara find that one clade of kingsnakes originated from two other clades of kingsnakes in the ancient past. This hybridization event helps explain why different genes supported different evolutionary histories for this hybrid lineage, and why this group had been difficult to place in the phylogeny in past studies that used only a few genes.
Similarly, MacGuigan and Near (2018) focus on a problematic phylogenetic group of darters. Due to historic gene tree incongruence in this group, they hypothesize that the clade may have originated via ancient hybridization, and use a number of analyses to test this hypothesis. First, they used D-statistic tests, also known as ABBA/BABA tests, to identify potential gene flow between three-taxon sets. Similar to the concordance factor tests, ABBA/BABA used the frequencies of shared alleles between species to differentiate between ILS and gene flow/hybridization. If the frequency of alleles is similar between species as compared with the outgroup, then ILS is responsible for discordance. But, if allele frequencies are biased toward a single taxon, then we infer that hybridization is responsible for the imbalance. At least, this is how I understand this test… Having never actually run the analysis it is still hard to wrap my mind around! MacGuigan and Near also used two phylogenetic network approaches to test for ancient hybridization. First, they ran SNaQ, as described above, as well as an alternative approach called PhyloNet. This approach uses the gene trees rather than the concordance factors and can use a variety of algorithms, including parsimony or pseudolikelihood and in theory are less computationally intense than SNaQ. Although the authors were still unable to place this lineage in the phylogeny with confidence, they were able to say that this phylogenetic uncertainty was the result of ancient gene flow rather than ILS.
Another approach not used in these papers that can model ancient gene flow/hybridization is TreeMix. This method estimates a population tree based on allele frequencies, and then estimates admixture events based on allele frequencies in a single ‘species/population’ coming from multiple parents. Although the method is designed for estimating gene flow between populations within a species, it seems to also work well at the species level.
Having run or become familiar with several of these methods, I can say that SNaQ seems to estimate migration well but can only handle a limited number of taxa. Phylonet can handle more taxa, but can also be pretty slow! The D-statistic test is limited to four taxa, and thus may not be helpful for exploring hybridization across a phylogeny. Finally, TreeMix was written to model populations within a species, but has also been used to model multiple species. One major difference is that the network approaches (SNaQ and PhyloNet) use gene trees (SNaQ uses concordance factors estimated from the gene trees), while ABBA/BABA and TreeMix use allele frequencies from SNPs.
Burbrink FT, Gehara M. 2018. The Biogeography of Deep Time Phylogenetic Reticulation. Systematic Biology, 67, 743–755.
MacGuigan, D. J., & Near, T. J. (2018). Phylogenomic Signatures of Ancient Introgression in a Rogue Lineage of Darters (Teleostei: Percidae). Systematic Biology.